+ All Categories
Home > Documents > Social Network Analysis with sna package

Social Network Analysis with sna package

Date post: 03-Jan-2017
Category:
Upload: dinhnhi
View: 219 times
Download: 2 times
Share this document with a friend
51
JSS Journal of Statistical Software February 2008, Volume 24, Issue 6. http://www.jstatsoft.org/ Social Network Analysis with sna Carter T. Butts University of California, Irvine Abstract Modern social network analysis—the analysis of relational data arising from social systems—is a computationally intensive area of research. Here, we provide an overview of a software package which provides support for a range of network analytic functionality within the R statistical computing environment. General categories of currently supported functionality are described, and brief examples of package syntax and usage are shown. Keywords : social network analysis, graphs, sna, statnet, R. 1. Introduction and overview Far more so than many other domains of social science, modern social network analysis (SNA) is a computationally intensive affair. Techniques based on eigensolutions (e.g., eigenvector and Bonacich centrality, multidimensional scaling), combinatorial optimization (e.g., permutation search in equivalence analysis, structural distance/covariance calculation), shortest-path com- putation (e.g., betweenness centrality, network diameter), and Monte Carlo integration (e.g., QAP and CUG tests) are central to the practice of SNA, and, indeed, the overwhelming ma- jority of current research in this area could not be performed without access to inexpensive computational tools. This dependence on computation for research in social network analysis has helped to spawn a wide array of software packages to perform network analytic tasks. From generalist tools such as UCINET (Borgatti et al. 1999), Pajek (Batagelj and Mrvar 2007), STRUCTURE (Burt 1991), StOCNET (Huisman and van Duijn 2003), MultiNet (Richards and Seary 2006), and GRADAP (Stokman and Van Veen 1981) to more specialized applications such as netdraw (Borgatti 2007), SIENA (Snijders 2001), and KrackPlot (Krackhardt et al. 1994) (to name a few), a variety of software solutions are available for the network analyst. While each of these packages has its own assets, there continues to be a need for network analysis software which is simultaneously:
Transcript
Page 1: Social Network Analysis with sna package

JSS Journal of Statistical SoftwareFebruary 2008 Volume 24 Issue 6 httpwwwjstatsoftorg

Social Network Analysis with sna

Carter T ButtsUniversity of California Irvine

Abstract

Modern social network analysismdashthe analysis of relational data arising from socialsystemsmdashis a computationally intensive area of research Here we provide an overview ofa software package which provides support for a range of network analytic functionalitywithin the R statistical computing environment General categories of currently supportedfunctionality are described and brief examples of package syntax and usage are shown

Keywords social network analysis graphs sna statnet R

1 Introduction and overview

Far more so than many other domains of social science modern social network analysis (SNA)is a computationally intensive affair Techniques based on eigensolutions (eg eigenvector andBonacich centrality multidimensional scaling) combinatorial optimization (eg permutationsearch in equivalence analysis structural distancecovariance calculation) shortest-path com-putation (eg betweenness centrality network diameter) and Monte Carlo integration (egQAP and CUG tests) are central to the practice of SNA and indeed the overwhelming ma-jority of current research in this area could not be performed without access to inexpensivecomputational tools

This dependence on computation for research in social network analysis has helped to spawn awide array of software packages to perform network analytic tasks From generalist tools suchas UCINET (Borgatti et al 1999) Pajek (Batagelj and Mrvar 2007) STRUCTURE (Burt1991) StOCNET (Huisman and van Duijn 2003) MultiNet (Richards and Seary 2006) andGRADAP (Stokman and Van Veen 1981) to more specialized applications such as netdraw(Borgatti 2007) SIENA (Snijders 2001) and KrackPlot (Krackhardt et al 1994) (to name afew) a variety of software solutions are available for the network analyst While each of thesepackages has its own assets there continues to be a need for network analysis software whichis simultaneously

2 Social Network Analysis with sna

1 General in coverage incorporating a range of different network analytic techniques

2 Easily extensible to allow for the timely incorporation of new methods andor refine-ments

3 Well-integrated with general purpose statistical computational and visualization toolsso as to facilitate the use of network analysis in conjunction with both end-user exten-sions and broader social science methodology

4 Based on an open codebase which is available for inspection (and hence emulationcorrection and improvement) by the network community

5 Portable to allow use by researchers on a variety of computing platforms and

6 Freely available to network researchers so as to encourage its use among the widestpossible range of scientists practitioners and students

This ldquowish listrdquo of attributes would seem to be a great deal to ask of any single standaloneprogram the emergence of open statistical computing platforms such as R (R DevelopmentCore Team 2007) however has provided a feasible means of realizing such objectives UsingR (which is itself free software in the Stallmanian sense see Stallman 2002) researcherscan easily produce and share packages which supply specialized functionality but which areinteroperable with other statistical computing tools In this vein the sna package was createdas a mechanism for fulfilling the above objectives within the R environment Additionalmotivations for the introduction of sna were to encourage the migration of the social networkcommunity to open source andor free software solutions to facilitate the creation of a sharedframework for dissemination of new methodological developments to further the developmentof statistical network analysis methods by network analysts and to ease the integration ofnetwork methods with those of ldquostandardrdquo statistical analysis

11 Package history

sna began life as a loose collection of S routines (called ldquoVarious Useful Tools for NetworkAnalysis in Srdquo or networkStools) written by the author which were disseminated locallyto social network researchers in and around the research community at Carnegie MellonUniversity and the University of Pittsburgh The first external use of the toolkit of which theauthor is aware was the netlogit analysis employed by Ingram and Roberts (2000) The firstversion of the collection to be generally disseminated (version 01) was released in August of2000 with the first R package version (sna version 03) appearing in May of 2001 Multiplereleases followed over subsequent years with the package reaching the ldquo10rdquo landmark inAugust of 2005 Development has been ongoing as of the time of this writing the package ison version 15

12 sna and statnet

As noted above a major goal in introducing sna was the creation of a foundation for ongoingdevelopment of tools within the network analysis community The statnet project (Handcocket al 2003) represents the latest incarnation of that objective (much as BioConductor Gentle-man et al 2004 serves as a site for tool development within the bioinformatics community)

Journal of Statistical Software 3

in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

13 Functionality

At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

Functions for positional and role analysis including structural equivalence and block-modeling

Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

4 Social Network Analysis with sna

custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

14 Terminology and data representation

As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 2: Social Network Analysis with sna package

2 Social Network Analysis with sna

1 General in coverage incorporating a range of different network analytic techniques

2 Easily extensible to allow for the timely incorporation of new methods andor refine-ments

3 Well-integrated with general purpose statistical computational and visualization toolsso as to facilitate the use of network analysis in conjunction with both end-user exten-sions and broader social science methodology

4 Based on an open codebase which is available for inspection (and hence emulationcorrection and improvement) by the network community

5 Portable to allow use by researchers on a variety of computing platforms and

6 Freely available to network researchers so as to encourage its use among the widestpossible range of scientists practitioners and students

This ldquowish listrdquo of attributes would seem to be a great deal to ask of any single standaloneprogram the emergence of open statistical computing platforms such as R (R DevelopmentCore Team 2007) however has provided a feasible means of realizing such objectives UsingR (which is itself free software in the Stallmanian sense see Stallman 2002) researcherscan easily produce and share packages which supply specialized functionality but which areinteroperable with other statistical computing tools In this vein the sna package was createdas a mechanism for fulfilling the above objectives within the R environment Additionalmotivations for the introduction of sna were to encourage the migration of the social networkcommunity to open source andor free software solutions to facilitate the creation of a sharedframework for dissemination of new methodological developments to further the developmentof statistical network analysis methods by network analysts and to ease the integration ofnetwork methods with those of ldquostandardrdquo statistical analysis

11 Package history

sna began life as a loose collection of S routines (called ldquoVarious Useful Tools for NetworkAnalysis in Srdquo or networkStools) written by the author which were disseminated locallyto social network researchers in and around the research community at Carnegie MellonUniversity and the University of Pittsburgh The first external use of the toolkit of which theauthor is aware was the netlogit analysis employed by Ingram and Roberts (2000) The firstversion of the collection to be generally disseminated (version 01) was released in August of2000 with the first R package version (sna version 03) appearing in May of 2001 Multiplereleases followed over subsequent years with the package reaching the ldquo10rdquo landmark inAugust of 2005 Development has been ongoing as of the time of this writing the package ison version 15

12 sna and statnet

As noted above a major goal in introducing sna was the creation of a foundation for ongoingdevelopment of tools within the network analysis community The statnet project (Handcocket al 2003) represents the latest incarnation of that objective (much as BioConductor Gentle-man et al 2004 serves as a site for tool development within the bioinformatics community)

Journal of Statistical Software 3

in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

13 Functionality

At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

Functions for positional and role analysis including structural equivalence and block-modeling

Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

4 Social Network Analysis with sna

custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

14 Terminology and data representation

As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 3: Social Network Analysis with sna package

Journal of Statistical Software 3

in some sense then statnet is the natural ldquosuccessorrdquo to sna Reflecting this relationshipsna is now considered to be part of the statnet project and is fully interoperable with otherstatnet packages (including network) sna may still be employed as a stand-alone packagehowever for users who do not require the full range of functionality provided by statnet

13 Functionality

At present the sna package includes over 125 functions for the manipulation and analysis ofnetwork data Supported functionality includes

Functions to compute descriptive indices at the graph or node level This includescentrality and centralization indices measures of hierarchy and prestige brokeragedensity reciprocity transitivity connectedness and the like as well as dyad triadpath and cycle census statistics Stand-alone routines to facilitate the comparison ofindex values across graphs via conditional uniform graph (CUG) tests are included

Functions to compute geodesic distances component structure and distribution andstructure statistics (in the sense of Fararo and Sunshine 1964) and to identify isolates

Functions for positional and role analysis including structural equivalence and block-modeling

Functions for exploratory edge set comparison in the paradigm of Butts and Carley(2005) This includes structural covariancecorrelation and distance routines as well astools for scaling and visualization of graph sets Network regression (Krackhardt 1988)canonical correlation analysis and logistic network regression are also supported QAP(Hubert 1987 Krackhardt 1987b) and CUG tests are currently implemented for all threeapproaches

Functions to generate graph-valued deviates from various stochastic processes So-calledErdos-Renyi graphs inhomogeneous Bernoulli graphs and dyad census conditionedgraphs are supported as are graphs produced by Watts-Strogatz rewiring processes(Watts and Strogatz 1998) and the biased net models of Skvoretz et al (2004) Rapoport(1957)

Functions to fit network autocorrelation (also known as spatial autocorrelation seeAnselin 1988) and biased net models

Functions for network inference (ie inferring networks from multiple reports containingmissing andor error-prone data) This includes heuristic estimators such as Krack-hardtrsquos (Krackhardt 1987a) locally aggregated structure estimators and the centralgraph (Banks and Carley 1994) as well as model-based methods such as the Romney-Batchelder consensus model (Romney et al 1986) and the error-rate models of (Butts2003)

Functions for visualization and manipulation of network data (in adjacency matrixform) Standard graph layout methods such as those of Fruchterman and Reingold(1991) and Kamada and Kawai (1989) general multidimensional scalingeigenstructuremethods and ldquotargetrdquo diagrams (Brandes et al 2003) are included by default and

4 Social Network Analysis with sna

custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

14 Terminology and data representation

As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 4: Social Network Analysis with sna package

4 Social Network Analysis with sna

custom layout routines are also supported Functions are included to facilitate com-mon tasks such as extracting neighborhoods and egocentric networks symmetrizationapplication of functions to attribute information on neighborhoods (eg computingneighborsrsquo mean attributes) dichotomization permutationrelabeling and the creationof interval graphs from spell data Data importexport is supported for several basicfile formats

The above includes many of the methods of what is sometimes calledldquoclassicalrdquo social networkanalysis (exemplified by Wasserman and Faust (1994) whose presentation is now canonical)as well as some more recent contributions to the literature Although the focus of the packagehas been on social scientific applications many of the included tools may also be useful foranalyzing networks arising from other sources

14 Terminology and data representation

As a special-purpose toolkit dedicated to social network analysis describing snarsquos functionalityrequires us to refer to standard SNA concepts and methods readers unfamiliar with networkanalysis may wish to consult the cited references (particularly Wasserman and Faust 1994) foradditional details Some specific terminology and notation is described below Throughoutthis paper we will be concerned with relational data consisting of a fixed set of entities (calledvertices) and a multiset of relationships among those entities (called edges) Our particularfocus is on dyadic relationships in which edges consist of (possibly ordered) two-elementmultisets on the set of vertices The elements of an edge are referred to as its endpoints withthe first element known as the tail (or sender) and the second known as the head (or receiver)in the ordered case An edge whose endpoints are identical is called a loop The combinationof an edge set E with vertex set V is said to be a graph (denoted G = (VE)) The sizeor order of a graph is the number of elements in its vertex set (denoted |V | where | middot | is thecardinality operator) Specific types of graphs may be identified via the constraints satisfiedby E If the elements of E are unordered multisets G is said to be an undirected graph ifedges are ordered multisets by contrast G is said to be a directed graph (or digraph) For anundirected graph the set of vertices tied (or adjacent) to vertex v is called the neighborhoodof v (denoted N(v)) In the directed case we distinguish between the set of vertices sendingedges to v (the in-neighborhood or Nminus(v)) and the set of vertices receiving edge from v (theout-neighborhood or N+(v)) A graph (directed or otherwise) is simple if it has no loops andif there exists no edge having multiplicity greater than one Finally a graphrsquos edge set maybe associated with a set of variables such that each edge carries some value A graph of thiskind is said to be valued as opposed to the contrary unvalued case

It is worth noting that use of terminology varies somewhat across the social network fieldmdashaperhaps unfortunate legacy of the fieldrsquos strongly interdisciplinary nature (Freeman 2004)Thus vertices may also be called ldquopointsrdquo or ldquonodesrdquo (or in social contexts ldquoactorsrdquo orldquoagentsrdquo) Likewise edges may be called ldquolinesrdquo ldquotiesrdquo or (if directed) ldquoarcsrdquo The termldquonetworkrdquo is often used generically to refer to any relational structure in other cases it maybe reserved to refer to the actually existing relational structure with ldquographrdquo being employedfor that structurersquos formal representation In the latter instance ldquotierdquo is frequently used asthe corresponding term for an actually existing relationship with ldquoedgerdquo denoting the formalrepresentation of that relationship While such terminological subtleties are not required touse sna an awareness of them may reduce confusion among users seeking to make use of the

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 5: Social Network Analysis with sna package

Journal of Statistical Software 5

literature cited within the package manual

With rare exceptions sna routines can be used with directed or undirected graphs with orwithout loops Edge values and missing data (ie edges whose states are unknown) aresupported in many applications as well Note however that many graph theoretic concepts(eg connectedness) admit somewhat different definitions in the directed and undirectedcasesmdashit is thus important to verify that one is using the settings which are appropriate tothe data at hand Except for functions whose behavior is undefined in the directed case snarsquosfunctions typically default to the assumption that onersquos data consists of one or more simpleunvalued digraphs

Relational data can be represented in a number of ways several of which are currently sup-ported by the sna package The most basic of these is the adjacency matrix ie a squarematrix A whose elements are defined such that Aij is the value of the (i j) edge (or i jedge in the undirected case) in the corresponding graph By convention Aij is a dichotomousindicator variable where the corresponding graph is unvalued Such matrices may be passedas matrix objects or as two-dimensional arrays While adjacency matrices are convenientto work with they are inefficient for large sparse graphs When working with such data theuse of network (Butts et al 2007) or sparse matrix (Koenker and Ng 2007 SparseM[) objectsmay be preferred sna accepts all three such data types interchangeably

In many instances one may need to perform operations on multiple graphs at once Wheresuch graphs are of the same order (ie number of vertices) they may be conveniently repre-sented by a three-dimensional array whose first dimension indexes the component adjacencymatrices Alternately it is also possible to specify multiple graphs by means of a list Thisallows for the user to pass graph sets of varying orders where required Within a graphlist single adjacency matrices adjacency arrays network and sparse matrix objects maybe mixed as desired individual graphs are unpacked sequentially in ascending list and arrayindex order prior to computation

Importing relational data into R

Another preliminary issue of obvious concern is the importation of relational data into RWhere such data is stored in matrix or array form conventional R routines such as readtableand scan may be employed in the usual manner Similarly natively saved network objectsmay be loaded directly into memory without external representation In addition to thesemethods sna includes custom routines for importing relational data in OrgStat NOS andGraphViz DOT formats Processed relational data can be saved via the above methods orin the DL format widely used by packages such as Pajek and UCINET (See also the Pajekimport function in network)

Beyond these network-specific approaches sna also has facilities for converting spell data (iedata consisting of intervals in time or other quantities) into interval graphs (West 1996) Theeponymously named intervalgraph function serves in this capacity converting an array ofspell information into one or more interval graphs spell-level categorical covariate informationmay also be included In addition to simple interval graphs intervalgraph will computethe valued overlap graphs proposed by Butts and Pixley (2004) for use with life history dataIn this case the overlap quantities are stored as edge values in the output adjacency matrix(or matrices if multiple spell sets were given)

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 6: Social Network Analysis with sna package

6 Social Network Analysis with sna

2 Package highlights

Given the wide scope of the methods implemented within the sna package we cannot reviewthem all in detail In this section however we attempt to summarize the functionality of snawithin a number of domains highlighting specific functions and applications which are likelyto be of general interest Brief examples are also provided within each section to illustratebasic syntax and usage Additional background and usage details are contained within thepackage manual which is distributed with the package itself

21 Random graph generation

sna has a range of tools for random graph generation Chief among these is rgraph aldquoworkhorserdquo function for simulating deviates from both homogeneous and inhomogeneousBernoulli graph distributions (Wasserman and Faust 1994) Given a set of tie probabilities(which may be specified by graph or by edge) it generates one or more graphs whose edgestates are independent Bernoulli trials conditional on the specified parameters1

In addition to rgraph sna has several other tools for random graph generation These cur-rently include rgnm (which draws uniform graphs and digraphs conditional on edge count)rguman (which draws uniform digraphs conditional on expected or realized dyad census statis-tics) rgws (which draws from a Watts-Strogatz graph process Watts and Strogatz 1998) andrgbn (which simulates a Skvoretz-Fararo biased net process (Skvoretz et al 2004)mdashsee alsoSection 27) Also useful are tools such as rmperm and the rewire functions which alteran input graph by random rowcolumn edgewise or dyadic permutations Functions whichcondition on degree distribution and the triad census are anticipated in future versions of sna

Example

To provide a sense for the syntax involved (and options available) when generating randomgraphs in sna we here provide a brief example of R code which draws graphs from a numberof models Note that the output type in each case is an adjacency matrix although snaroutines accept network and related objects as input (per Section 14) the packagersquos currentrandom graph generators produce output in adjacency matrix or array form The range ofoutput types may be expanded in future package versions To begin we first load the snalibrary and fix the random seed (for reproducibility)

Rgt library(sna)

Rgt setseed(1913)

As noted above rgraph can be used in various ways to obtain graphs (directed or other-wise) with different expected densities For instance three digraphs with respective expecteddensities 01 09 and 05 can be drawn as follows

Rgt g lt- rgraph(10 3 tprob=c(01 09 05))

Rgt gden(g)

[1] 01000000 08666667 05333333

1rgraph can also be employed to simulate valued graphs via a resampling procedure

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 7: Social Network Analysis with sna package

Journal of Statistical Software 7

gden which we shall encounter again later is an sna function which returns the densityof one or more input graphs as expected the observed densities here closely match theirexpectations The tprob parameter used above to set the probability of each edge on aper-graph basis can also be used in other ways For instance passing a matrix of Bernoulliparameters to tprob will cause rgraph to sample from the corresponding inhomogeneousBernoulli graph model (in which the probability of an (i j) edge is equal to tprob[ij] Forexample consider a simple model for a digraph of order 10 in which the probability of an(i j) edge is equal to j10 Such a graph can be drawn easily as follows

Rgt gp lt- sapply((110) 10 rep 10)

Rgt g lt- rgraph(10 tprob = gp)

Rgt g

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 1 0 0 1 1 1[2] 0 0 0 1 0 1 0 0 1 1[3] 0 0 0 0 0 1 0 1 0 1[4] 0 0 0 0 1 1 1 1 1 1[5] 0 1 0 0 0 0 1 1 1 1[6] 0 0 1 0 1 0 1 0 1 1[7] 0 1 1 0 1 0 0 1 1 1[8] 0 0 1 1 1 0 1 0 1 1[9] 0 0 0 1 1 0 1 1 0 1[10] 0 0 0 0 0 0 1 1 1 0

Rgt apply(g 2 mean)

[1] 00 02 03 03 06 03 06 07 08 09

Since rgraph disallows loops by default diagonal entries are ignored in the above cases thusthe column means here have expectation 09(j10) The observed means are quite close tothis but obviously vary due to the underlying Bernoulli process For random graphs withexact constraints on edge count we must use rgnm For instance to take 5 draws from theuniform distribution on the order 10 graphs having 12 edges we would proceed as follows

Rgt g lt- rgnm(5 10 12)

Rgt apply(g 1 sum)

[1] 12 12 12 12 12

As the dyadic counterpart to both rgraph and rgnm rguman models digraphs whose distribu-tions are parameterized by dyad states As each dyad corresponds to a pair of edge variablesit can be readily classified into the three isomorphism classes of mutual (both edges present)asymmetric (one edge present) or null (no edges present) The number of dyads in each classwithin a graph is known as its dyad census and has been used as a simple basis for modelingnetwork structure at least since the work of Holland and Leinhardt (1970) rguman can beemployed either to generate uniform digraphs conditional on an exact dyad census constraint

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 8: Social Network Analysis with sna package

8 Social Network Analysis with sna

or to draw from a multinomial graph model of independent dyads with fixed expected countsThe former case can be used to generate graphs of particular types For instance the trivialcases of complete complete tournament and null graphs can be generated by placing alldyads within the appropriate isomorphism class

Rgt k10 lt- rguman(1 10 mut = 45 asym = 0 null = 0 method = exact)

Rgt t10 lt- rguman(1 10 mut = 0 asym = 45 null = 0 method = exact)

Rgt n10 lt- rguman(1 10 mut = 0 asym = 0 null = 45 method = exact)

Rgt k10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 1 1 1 1 1 1 1 1 1[2] 1 0 1 1 1 1 1 1 1 1[3] 1 1 0 1 1 1 1 1 1 1[4] 1 1 1 0 1 1 1 1 1 1[5] 1 1 1 1 0 1 1 1 1 1[6] 1 1 1 1 1 0 1 1 1 1[7] 1 1 1 1 1 1 0 1 1 1[8] 1 1 1 1 1 1 1 0 1 1[9] 1 1 1 1 1 1 1 1 0 1[10] 1 1 1 1 1 1 1 1 1 0

Rgt t10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 1 0 0 0[2] 1 0 1 0 1 1 0 0 0 1[3] 1 0 0 1 1 0 0 1 0 0[4] 1 1 0 0 0 1 0 1 0 1[5] 1 0 0 1 0 1 1 1 1 0[6] 1 0 1 0 0 0 1 1 1 0[7] 0 1 1 1 0 0 0 1 1 0[8] 1 1 0 0 0 0 0 0 1 1[9] 1 1 1 1 0 0 0 0 0 0[10] 1 0 1 0 1 1 1 0 1 0

Rgt n10

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 0 0 0 0 0 0 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 0[3] 0 0 0 0 0 0 0 0 0 0[4] 0 0 0 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 0 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 9: Social Network Analysis with sna package

Journal of Statistical Software 9

[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 0 0

When not inldquoexactrdquomode rguman draws dyads as independent multinomial random variableswith specified type probabilities This can be used to obtain random structures with varyingdegrees of bias toward or away from mutuality Thus to obtain a random graph in whichreciprocated ties are overrepresented one might use a model like the following

Rgt g lt- rguman(1 100 mut = 015 asym = 005 null = 08)

Rgt mean(g[uppertri(g)] t(g)[uppertri(g)])

[1] 01482828

Rgt mean(g[uppertri(g)] = t(g)[uppertri(g)])

[1] 004646465

Rgt mean((g)[uppertri(g)] t(g)[uppertri(g)])

[1] 08052525

By contrast with the expectation under the above model a Bernoulli graph with the sameexpected density would have a mean mutuality rate of approximately 003 (with asymmetricdyads outnumbering mutual dyads by a factor of approximately 94) Thus the behavior ofthe multinomial dyad model can deviate substantially from that of the Bernoulli graph familydespite their underlying similarity

More extensive departures from independence require alternatives to the simple independentedgedyad paradigm One such alternative is the Skvoretz-Fararo family of biased net pro-cesses which are discussed in more detail in Section 27 As we will see these processes arespecified in terms of the conditional probability of an edge given other edges within the graphthis immediately suggests the use of a Gibbs sampler (see eg (Gilks et al 1996)) to drawrealizations of the graph process Such a sampler is implemented via the rgbn function whichuses an iterative edge updating scheme to form a Markov chain whose equilibrium distribu-tion corresponds to the distribution of (directed) graphs resulting from the Skvoretz-Fararoprocess Thinning and burn-in parameters may be specified by the user along with modelparameters (which by default correspond to the uniform random digraph model) Parame-ters may be adjusted to produce ldquoparentrdquo or reciprocity biases (π) ldquosiblingrdquo or shared partnerbiases (σ) and ldquodouble rolerdquo biases or parentsibling interaction effects (ρ) as well as baselinedensity effects (d) parameters vary from 0 to 1 with 0 indicating no bias The command todraw a sample of 5 order 10 networks with both reciprocity and triangle formation biases willthen look something like the following

Rgt g lt- rgbn(5 10 param = list(pi = 005 sigma = 01 rho = 005

+ d = 015))

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 10: Social Network Analysis with sna package

10 Social Network Analysis with sna

with the magnitude of the specified effects depending on the exact choice of parameters

Finally we note that random graphs can also be produced by modifying existing networksFor instance the Watts and Strogatz (1998) ldquorewiringrdquo process takes an input network and(with specified probability) exchanges each non-null dyad with a randomly chosen null dyadsharing exactly one endpoint with the original dyad Such a process obviously conservesedges eg

Rgt g lt- matrix(0 10 10)

Rgt g[1] lt- 1

Rgt g2 lt- rewirews(g 05)[1]

Rgt g2

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10][1] 1 0 1 1 1 1 0 0 0 0[2] 0 0 0 0 0 0 0 0 0 1[3] 0 1 0 0 0 0 0 0 0 0[4] 0 0 1 0 0 0 0 0 0 0[5] 0 0 0 0 0 0 0 0 0 0[6] 0 0 0 0 1 0 0 0 0 0[7] 0 0 0 0 0 0 0 0 0 0[8] 0 0 0 0 0 0 0 0 0 0[9] 0 0 0 0 0 0 0 0 0 0[10] 0 0 0 0 0 0 0 0 1 0

Rgt sum(g - g2) == 0

[1] TRUE

Another example of an edge-preserving random transformation is the random permutationof vertex order rmperm can be employed for this purpose as for example in the followingpermutation of the graph g2 above

Rgt g3 lt- rmperm(g2)

Rgt all(sort(apply(g2 2 sum)) == sort(apply(g3 2 sum)))

[1] TRUE

Rowcolumn permutation preserves theldquounlabeledrdquostructure of the input graph (ie it drawsfrom the graphrsquos isomorphism class) and plays an important role in certain test proceduresfor matrix comparison (Hubert 1987 Krackhardt 1987b)

22 Visualization and data manipulation

Visualization and manipulation of relational data is a central task of relational analysis andsna has a number of functions which are intended to facilitate this process Some of these func-tions are quite basic for instance diagremove lowertriremove and uppertriremove

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 11: Social Network Analysis with sna package

Journal of Statistical Software 11

extend the assignment behavior of Rrsquos diag lowertri and uppertri functions to ar-rays gvectorize and sr2css convert network data from one form to another symmetrizemakestochastic and event2dichot perform basic data-normalizing operations on graphsor graph sets addisolates adds isolates to one or more input graphs stackcount de-termines the number of graphs in an input stack etc Several other functions bear furtherexplanation For instance evaledgeperturbation is a wrapper function which computesthe difference in the value of a graph statistic resulting from forcing the selected edge oredges to be present versus forcing them to be absent (holding all other edges constant) Suchdifferences are used extensively in computation for simulation and inference from exponentialrandom graph processes (see eg Snijders 2002) and have also been used to assess structuralrobustness (Dodds et al 2003 Borgatti et al 2006) evaledgeperturbation is flexible andcan be used with any graph-level index function Its use is straightforward ie

Rgt g lt- rgraph(5)

Rgt evaledgeperturbation(g 1 2 centralization betweenness)

[1] 007291667

Unfortunately the drawback to the flexibility of this routine is its inefficiencyevaledgeperturbation cannot take advantage of any special properties of the change-scorebeing calculated and hence is inefficient for properties such as triad counts whose changes canbe calculated much more quickly than the base statistic This function is hence a useful utilityfor simple exploratory applications and does not replace the specialized (but less flexible)change-score functions used within packages such as ergm

Another pair of useful but idiosyncratic utility functions are rperm and numperm whichproduce permutation vectors with specified characteristics (Recall that permuting a graphrsquosadjacency matrix is equivalent to altering the ldquoidentitiesrdquo of its vertices while leaving theunderlying ldquounlabeledrdquo structure unchanged) Although not graph manipulation functionsper se these routines are of importance for generating restricted permutations for use inQAP tests (Hubert 1987) and comparison of partially labeled graphs (Butts and Carley 2005)rperm draws a (uniform) random permutation vector such that vertices may only be exchangedif they belong to the same (user-supplied) equivalence class numperm is a deterministicfunction which returns the nth (unconstrained) permutation in lexical sort order this isuseful for exhaustive search through a (hopefully small) permutation set or when samplingpermutations without replacement

In addition to the above two families of graph manipulation functions bear discussing in moredetail These are functions to compute properties of neighborhoods and functions for graphvisualization Here we briefly discuss each family in turn before proceeding to a review ofsnarsquos descriptive index routines

Neighborhood and ego net functions

The egocentric network (or ldquoego netrdquo) of vertex v in graph G is defined as G[v cupN(v)] (iethe subgraph of G induced by v and its neighborhood) egoextract is a utility functionwhich for a given input graph (or set thereof) extracts the egocentric networks for one ormore vertices This can be a useful shortcut for computing local structural properties orfor simulating the effects of ego net sampling (see Marsden 2005) For directed graphs it

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 12: Social Network Analysis with sna package

12 Social Network Analysis with sna

is further possible to specify the use of incoming outgoing or combined neighborhoods forgenerating the induced subgraphs

While egoextract is useful for assessing local structural properties it does not provide forcomputation on attributes (ie exogenous covariates) of vertex neighbors This functionalityis supplied by gapply For each vertex in its input set gapply first identifies all members of itsneighborhood neighborhoods may be in out or combined and higher-order neighborhoodsmay be selected (as discussed below) Once each neighborhood has been identified gapplyapplies a user-specified function to the neighborsrsquo covariates (which may be supplied as anumeric vector) This provides a very quick and easy way to calculate properties such asthe size of a given vertexrsquos 3rd-order neighborhood the fraction of its alters with a givencharacteristic the average value of its alters on a specified covariate etc

In addition to the above it is sometimes useful to be able to examine more complex neigh-borhood structures in their own right (eg as hypothetical influence matrices for networkautocorrelation modeling) neighborhood provides for such computations returning for agiven graph the adjacency matrix whose i j cell is an indicator for the membership of vertexj in vertex irsquos selected neighborhood Specifically the adjacency matrix associated with the0th order neighborhood is defined as the identity matrix for order and for orders k gt 0depends on the type of adjacency involved For input graph G = (VE) let the base relationR be given by the underlying graph of G (ie G cup GT ) if total neighborhoods are soughtthe transpose of G if incoming neighborhoods are sought or G otherwise The partial neigh-borhood structure of order k gt 0 on R is then defined to be the digraph on V whose edgeset consists of the ordered pairs (i j) having geodesic distance k in R The correspondingcumulative neighborhood is formed by the ordered pairs having geodesic distance less thanor equal to k in R neighborhood computes either partial or cumulative neighborhoods ofarbitrary order and with arbitrary choice of edge direction

To illustrate snarsquos egocentric network tools we begin by generating a sample network andextracting ego nets based on in out and combined neighborhoods The resulting lists of egonets are then easily subjected to other analyses as seen below

Rgt g lt- rgraph(10 tp = 15 9)

Rgt gin lt- egoextract(g neighborhood = in)

Rgt gout lt- egoextract(g neighborhood = out)

Rgt gcomb lt- egoextract(g neighborhood = combined)

Rgt gcomb[13]

$`1`[1] [2] [3] [4]

[1] 0 1 1 0[2] 1 0 0 0[3] 0 0 0 0[4] 1 0 0 0

$`2`[1] [2] [3] [4]

[1] 0 1 0 0[2] 1 0 0 0

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 13: Social Network Analysis with sna package

Journal of Statistical Software 13

[3] 1 0 0 0[4] 1 0 1 0

$`3`[1] [2] [3] [4]

[1] 0 1 1 0[2] 0 0 0 0[3] 0 0 0 0[4] 1 1 0 0

Rgt all(sapply(gin NROW) == degree(g cmode = indegree) + 1)

[1] TRUE

Rgt all(sapply(gout NROW) == degree(g cmode = outdegree) + 1)

[1] TRUE

Rgt all(sapply(gcomb NROW) lt= degree(g) + 1)

[1] TRUE

Rgt egosize lt- sapply(gcomb NROW)

Rgt if(any(egosize gt 2))

+ sapply(gcomb[egosize gt 2] function(x)gden(x[-1-1]))

1 2 3 4 5 6 7000000000 016666667 016666667 000000000 000000000 000000000 000000000

8 9 10000000000 008333333 000000000

Note that egocentric network density is often calculated as the density of ties among alters ieneglecting egorsquos contribution (since ego must be tied to all alters by design) This is the form ofdensity calculated above In doing so we have made use of the fact that egoextract alwaysplaces ego in the first rowcolumn of each extracted adjacency matrix thereby facilitating itsremoval where required This example also makes use of degree and gden to calculate degreeand graph density respectively these are discussed in more detail below

Where computation on attributes of neighboring vertices is required (as opposed to the egonets themselves) we turn to gapply As the following example illustrates gapply can beused to count features of vertex neighborhoods (degree being the most trivial example) otherstatistics (eg means quantiles etc) can be used as well

Rgt g lt- rgraph(6)

Rgt all(gapply(g 1 rep(1 6) sum) == degree(g cmode = outdegree))

[1] TRUE

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 14: Social Network Analysis with sna package

14 Social Network Analysis with sna

Rgt all(gapply(g 2 rep(1 6) sum) == degree(g cmode = degree))

[1] TRUE

Rgt all(gapply(g c(1 2) rep(1 6) sum) == degree(symmetrize(g)

+ cmode = freeman) 2)

[1] TRUE

Rgt gapply(g c(1 2) 16 mean)

[1] 400 300 300 550 325 325

Rgt gapply(g c(1 2) 16 mean distance = 2)

[1] 40 38 36 34 32 30

To obtain adjacency matrices for neighborhoods themselves we employ the neighborhoodfunction

Rgt g lt- rgraph(10 tp = 29)

Rgt neigh lt- neighborhood(g 9 neighborhoodtype = out returnall = TRUE)

Rgt par(mfrow=c(33))

Rgt for(i in 19)

+ gplot(neigh[i]main = paste(Partial Neighborhood of Order i))

Rgt neigh lt- neighborhood(g 9 neighborhoodtype=out returnall = TRUE

+ partial = FALSE)

Rgt par(mfrow = c(3 3))

Rgt for(i in 19)

+ gplot(neigh[i] main = paste(Cumulative Neighborhood of Order i))

Typical output for the above is shown in Figures 1 (partial neighborhoods) and 2 (cumula-tive neighborhoods) These displays highlight the difference between partial and cumulativeneighborhoods illustrating each at all orders of depth The rapidity with which such neigh-borhoods ldquofill outrdquo the network is instructive of properties such as local clustering we willrevisit this issue when we discuss the structurestatistics function below

Visualization

Network visualization has been a fundamental aspect of social network analysis since its in-ception (Freeman 2004) and this functionality is an important feature of sna The primaryldquoworkhorserdquo routine for graph visualization within sna is gplot which displays an input net-work using a two-dimensional layout Many options are available to gplot including theability to specify characteristics such as size color and shape for individual vertices edgesand edge labels Vertex layout is controlled via a modular collection of layout functions(gplotlayout) which are called transparently by gplot itself Built-in functions includethe well-known algorithms of Fruchterman and Reingold (1991) Kamada and Kawai (1989)

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 15: Social Network Analysis with sna package

Journal of Statistical Software 15

Partial Neighborhood of Order 1 Partial Neighborhood of Order 2 Partial Neighborhood of Order 3

Partial Neighborhood of Order 4 Partial Neighborhood of Order 5 Partial Neighborhood of Order 6

Partial Neighborhood of Order 7 Partial Neighborhood of Order 8 Partial Neighborhood of Order 9

Figure 1 Sample partial neighborhoods of increasing order vertex v is adjacent to vertex vprime

in the ith panel iff vprime belongs to the ith order partial neighborhood of v

and Hall (1970) as well as layouts based on general multidimensional scaling and eigenstruc-ture procedures circular layouts and random placement User-supplied functions can also beemployed by creating an appropriate gplotlayout routine required arguments are describedin the gplotlayout manual page For ldquotarget diagramsrdquo in which graphs are plotted alongconcentric circles based on the magnitude of a specified covariate gplottarget supplies auseful front-end to gplot The layout method used in this case is that of Brandes et al(2003) which may also be employed directly within gplot Should no available layout sufficecoordinates may be set manuallymdashinteractive vertex placement is also supported

While two-dimensional visualization is favored in most settings it can also be useful to exam-ine complex networks in three dimensions Installing Rrsquos optional rgl enables gplot3d whichallows interactive network visualization in three dimensions Available settings are similar togplot with layout algorithms analogously controlled by the gplot3dlayout functionsInterface and output methods are as per rgl and may vary slightly by platform

Where highly customized displays are desired it may be useful to have access to the low-leveltools used by gplot and gplot3d to display vertices and edges gplotvertex gplotarrowgplotloop gplot3darrow and gplot3dloop can all be used directly to place gplot

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 16: Social Network Analysis with sna package

16 Social Network Analysis with sna

Cumulative Neighborhood of Order 1 Cumulative Neighborhood of Order 2 Cumulative Neighborhood of Order 3

Cumulative Neighborhood of Order 4 Cumulative Neighborhood of Order 5 Cumulative Neighborhood of Order 6

Cumulative Neighborhood of Order 7 Cumulative Neighborhood of Order 8 Cumulative Neighborhood of Order 9

Figure 2 Sample cumulative neighborhoods of increasing order vertex v is adjacent to vertexvprime in the ith panel iff vprime belongs to the ith order cumulative neighborhood of v

elements within arbitrary displays Options for these functions are flexible and similar inform to those employed in the gplot front-end routines It is also possible to change thebehavior of the front-end visualization functions by modifying these functions should thisbecome necessary for more exotic applications

All of the above functions display relational information in sociogram form ie as closedshapes connected by edges It is also possible to visualize adjacency matrices directly (ieas a tabular display) using the plotsociomatrix function While this is rarely useful as anexploratory tool it can be helpful when visualizing block structure (see Section 25 below) orwhen examining matrices which are too large to display effectively using the standard printmethod

gplot is a versatile routine with many options only a few of which can be illustrated hereCurved edges variable vertex shapes labels etc are among the currently supported fea-tures (Primitive interactive vertex placement is also supported via the interactive optionwhich can be useful in refining complex displays) Some examples of the use of gplot (andplotsociomatrix) are shown here

Rgt g lt- rgraph(5 diag = TRUE)

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 17: Social Network Analysis with sna package

Journal of Statistical Software 17

Default Curved Edges MDS Layout

Circular Layout Sociomatrix

1

2

3

4

5

1 2 3 4 5

1

2

3

4

5

Multiple Options

1

2

3

4

5

Figure 3 Sample visualizations using gplot with multiple layout and display options

Rgt par(mfrow = c(2 3))

Rgt gplot(g main = Default)

Rgt gplot(g usecurv = TRUE main = Curved Edges)

Rgt gplot(g mode = mds main = MDS Layout)

Rgt gplot(g mode = circle main = Circular Layout)

Rgt plotsociomatrix(g main = Sociomatrix)

Rgt gplot(g diag = TRUE vertexcex = 15 vertexsides = 38

+ vertexcol = 15 vertexborder = 26 vertexrot = (04) 72

+ displaylabels = TRUE labelbg = gray90 main = Multiple Options)

Output from the above is shown in Figure 3

Three-dimensional display using gplot3d can be especially useful when examining networkswith non-planar structure In the following example we see how gplot3d can be used tovisualize the behavior of a three-dimensional Watts-Strogatz rewired lattice process (Thisexample requires the rgl package to execute)

Rgt gplot3d(rgws(1 5 3 1 0))

Rgt gplot3d(rgws(1 5 3 1 005))

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 18: Social Network Analysis with sna package

18 Social Network Analysis with sna

Figure 4 Three-dimensional visualizations of a Watts-Strogatz process at increasing rewiringrates

Rgt gplot3d(rgws(1 5 3 1 02))

Snapshots of the resulting visualizations are shown in Figure 4 While not evident fromthe sampled output the usual interactive features of rgl (eg rotation zooming etc) areavailable when using gplot3d ndash this can in and of itself be useful when examining largecomplex structures

As noted the lower-level routines used by gplot to produce vertices and edges can be em-ployed directly within other displays For instance consider the following

Rgt par(mfrow = c(1 3))

Rgt plot(0 0 type = n xlim = c(-15 15) ylim = c(-15 15) asp = 1

+ xlab = ylab = main = gplotvertex Example)

Rgt gplotvertex(cos((110) 10 2 pi) sin((110) 10 2 pi)

+ col = 110 sides = 312 radius = 01)

Rgt plot(12 12 xlab = ylab = main = gplotarrow Example)

Rgt gplotarrow(1 1 2 2 width = 001 col = red border = black)

Rgt plot(0 0 type = n xlim = c(-2 2) ylim = c(-2 2) asp = 1

+ xlab = ylab = main = gplotloop Example)

Rgt gplotloop(c(0 0) c(1 -1) col = c(3 2) width = 005 length = 04

+ offset = sqrt(2) 4 angle = 20 radius = 05 edgesteps = 50

+ arrowhead = TRUE)

Rgt polygon(c(025 -025 -025 025 NA 025 -025 -025 025) c(125

+ 125 075 075 NA -125 -125 -075 -075) col = c(2 3))

The corresponding output shown in Figure 5 suggests some of the flexibility of the gplottools These functions may be used to add elements to existing gplot output or to createalternative display mechanisms They may also be used within non-network contexts aspolygon-based alternatives to Rrsquos built-in points and arrows commands

23 Descriptive indices

The literature of social network analysis is rich with descriptive indices of various sorts

gplot3d1gif
Media File (imagegif)
gplot3d2gif
Media File (imagegif)
gplot3d3gif
Media File (imagegif)

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 19: Social Network Analysis with sna package

Journal of Statistical Software 19

minus15 minus10 minus05 00 05 10 15

minus15

minus10

minus05

00

05

10

15

gplotvertex Example

10 12 14 16 18 20

10

12

14

16

18

20

gplotarrow Example

minus2 minus1 0 1 2

minus2minus1

01

2

gplotloop Example

Figure 5 Examples of the use of gplot supplemental functions

all of which seek to quantify particular aspects of relational structure Broadly speakingthe most commonly used indices may be divided into two classes node-level indices (NLIs)which express properties of the positions of particular vertices and graph-level indices (GLIs)which express properties of entire graphs More formally node-level indices can be thoughtof as mappings of the general form f V times G 7rarr R where G is the set of graphs on whichf is defined (with associated vertex set V ) Graph-level indices by contrast are of the formf G 7rarr R Although this framework is easily extended to incorporate covariates indices ofthis type are uncommon we will see an important counterexample below however

Node-level indices

Of the node-level indices the most well-developed are the centrality indices Formal char-acterization of centrality indices as a distinct class of NLIs has proved elusive (though seeefforts by Sabidussi (1966) and Brandes and Erlebach (2005) chapters 3ndash5) but all intu-itively reflect some sense in which a vertex occupies a prominent or ldquocentralrdquo position withina graph Among the most widely used centrality indices are those of Freeman (1979) whichreflect a standardized ldquoparing downrdquo of a range of similar measures used in earlier workThese indicesmdashdegree betweenness and closenessmdashare implemented in sna via the epony-mous degree betweenness and closeness functions Degree a standard graph theo-retic concept is given by cd(vG) equiv |N(v)| for undirected G In the directed case threenotions of degree are generally encountered outdegree (cd+(vG) equiv |N+(v)|) indegree(cdminus(vG) equiv |Nminus(v)|) and total or ldquoFreemanrdquo degree (cdt(vG) equiv cd+(vG) + cdminus(vG))All of these are supported via degree Betweenness measures the extent to which a givenvertex lies on non-redundant geodesics between third parties The index is formally definedas cb(vG) equiv

sum(vprimevprimeprime)subV v

gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) where g(v vprime G) is the number of (v vprime) geodesics in

G g(v vprime vprimeprime G) is the number of (v vprimeprime) geodesics in G containing vprime and gprime(vprimevvprimeprimeG)g(vprimevprimeprimeG) is taken

equal to 0 where g(vprime vprimeprime G) = 0 A close variant stress centrality is identical save for thedenominator of the geodesic count ratio which is set to 1 (Shimbel 1953) this is implementedby stresscent in sna Finally closeness is given by cc(vG) equiv nminus1P

vprimeisinV d(vvprime) where d(v vprime)is the geodesic distance from vertex v to vertex vprime Closeness is ill-defined on graphs whichare not strongly connected unless distances between disconnected vertices are taken to beinfinite In this case cc(vG) = 0 for any v lacking a path to any vertex and hence all

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 20: Social Network Analysis with sna package

20 Social Network Analysis with sna

closeness scores will be 0 for graphs having multiple weak components Due to this fragilitycloseness is less often deployed than the other two of Freemanrsquos measures

Another important family of measures includes the eigenvector and Bonacich power centrali-ties both of which are based on spectral properties of the graph adjacency matrix Eigenvectorcentrality (implemented in sna via evcent) is simply the absolute value of the principal eigen-vector of A (where A is the graph adjacency matrix) This can be interpreted variously as ameasure of ldquocorenessrdquo (or membership in the largest dense cluster) ldquorecursiverdquo or ldquoreflectedrdquodegree (ie v is central to the extent to which it has many ties to other central nodes) or ofthe ability of v to reach other vertices through a multiplicity of short walks Bonacich (1987)extended this notion via a measure equal to cbp(G) = α (Iminus βA)minus1 A1 where a solutionexists This index approaches the eigenvector centrality as β approaches the reciprocal of theprincipal eigenvalue of A and degree as β approaches 0 Setting β lt 0 reverses the senseof the dependence of centrality scores across vertices where β is negative vertices becomemore central by being attached to less central alters This effect was intended to capturethe behavior of equilibrium payoffs in bilateral exchange networks with credible exclusionthreats as with the positive case parameter magnitude in this instance reflects the degree ofweight afforded distant edges The bonpow command in sna implements the Bonacich powermeasure for user-specified values of β The scaling parameter α is by convention set so as toresult in a centrality vector of length equal to |V |mdashin general it should be remembered thatthis measure is uniquely defined only up to a rescaling operation Closely related to evcentand bonpow are prestige (which calculates various prestige measures) and infocent (whichcalculates the information centrality of Stephenson and Zelen 1989) Although a range ofindices is included within prestige all measure the extent to which individuals secure thedirect or indirect nomination of others several variants of eigenvector centrality are includedfor this purpose Information centrality provides an indication of the extent to which eachindividual has a large number of short walks to other actors in the network It is similar toeigenvector centrality in being walk-based but weights short walks more heavily (and longwalks less heavily) than the former

An example of a more specialized family of node-level indices is given by the Gould andFernandez (1989) brokerage scores The total brokerage of a given vertex v is defined asthe number of ordered pairs (vprime vprimeprime) such that (vprime v) (v vprimeprime) isin E and (vprime vprimeprime) 6isin Emdashthatis the number of pairs for which v serves as a local bridge Now let us posit a vectorof states s with V such that si is the state of vi isin V (ldquoStaterdquo in this case can be anyexogenous covariate although Gould and Fernandez initially intended it to be a categoricalindicator of group membership) Gould and Fernandez define five specific types of brokerage(or brokerage roles) based on the states of the three vertices within a locally bridged pairFor an ordered triad (vi vj vk) with brokering vertex vj the possible brokerage roles arecoordinating (si = sj = sk) itinerant (si = sk si 6= sj) gatekeeping (sj = sk si 6= sj)representative (si = sj sj 6= sk) and liaison (si 6= sj sj 6= sk si 6= sk) The brokerage scorefor vertex v with respect to a particular role is defined as the number of ordered triads of theappropriate type for which v is a broker The brokerage function computes these (and total)brokerage scores for all vertices as well as the total amount of brokerage within each roleperformed throughout the network First and second moments for brokerage scores undera null hypothesis of random association (holding fixed s and the expected density) are alsoprovided as well as the z-tests suggested by Gould and Fernandez It should be cautionedthat the authors did not prove that the statistics in question are asymptotically normal under

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 21: Social Network Analysis with sna package

Journal of Statistical Software 21

the null model and hence the statistical foundation for their associated tests is somewhatdubious when in doubt it may be wise to perform a simulation-based conditional uniformgraph or permutation test

To illustrate the use of node-level index routines within sna we compute various centralityindices on a random digraph generated by rgraph In the case of the Bonacich power measurewe also illustrate the impact of various decay parameter settings For comparison we beginby showing indegree outdegree total degree closeness betweenness stress Hararyrsquos graphcentrality eigenvector centrality and information centrality on the same network

Rgt dat lt- rgraph(10)

Rgt degree(dat cmode = indegree)

[1] 4 4 8 2 4 5 4 4 3 6

Rgt degree(dat cmode = outdegree)

[1] 6 3 5 2 5 4 4 4 5 6

Rgt degree(dat)

[1] 10 7 13 4 9 9 8 8 8 12

Rgt closeness(dat)

[1] 07500000 05625000 06923077 05000000 06923077 06428571 06000000[8] 06428571 06923077 07500000

Rgt betweenness(dat)

[1] 87666667 22000000 113500000 03333333 57833333 64833333[7] 24500000 20333333 24166667 81833333

Rgt stresscent(dat)

[1] 21 6 27 1 14 15 6 7 7 21

Rgt graphcent(dat)

[1] 05000000 03333333 05000000 03333333 05000000 05000000 03333333[8] 05000000 05000000 05000000

Rgt evcent(dat)

[1] 03967806 02068905 03482775 01443617 03098004 03179091 02885521[8] 02734192 03642163 04121985

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 22: Social Network Analysis with sna package

22 Social Network Analysis with sna

Rgt infocent(dat)

[1] 3712599 3102093 3955891 2695898 3712425 3413946 3094442 3425508[9] 3077481 3704181

As the above illustrate the various standard centrality measures differ greatly in scale theyare however generally positively correlated Other measures such as the Bonacich powerscore (bonpow) have properties which can differ substantially depending on user-specified pa-rameters In the case of bonpow we have already noted that the scorersquos behavior is controlledby a decay parameter (set by the exponent argument) which determines the nature andstrength of egorsquos dependency upon his or her alters Simple calculations (shown below) verifythat the bonpow measure is proportional to outdegree when exponent = 0 and is equivalentto eigenvector centrality when exponent is set to the reciprocal of the first eigenvalue of theadjacency matrix bonpowrsquos most interesting behavior occurs when exponent lt 0 expressingthe notion that ego becomes stronger when attached to weak alters (and vice versa) As theexample below illustrates the behavior of the measure in this case is essentially unrelatedto both eigenvector and degree reflecting a very different set of assumptions regarding theunderlying social process

Rgt bonpow(dat exponent = 0) degree(dat cmode = outdegree)

[1] 02192645 02192645 02192645 02192645 02192645 02192645 02192645[8] 02192645 02192645 02192645

Rgt all(abs(bonpow(dat exponent = 1 eigen(dat)$values[1] rescale = TRUE) -

+ evcent(dat rescale = TRUE)) lt 1e-10)

[1] TRUE

Rgt bonpow(dat exponent = -05)

[1] 10764391 12917269 -01230216 09534175 04613310 04920864[7] 04613310 09226621 03075540 21528782

As noted above brokerage requires a vector of group memberships (ie vertex states) inaddition to the network itself Here we randomly assign vertices to one of three groups usingthe resulting vector to calculate brokerage scores

Rgt memb lt- sample(13 10 replace = TRUE)

Rgt summary(brokerage(dat memb))

Gould-Fernandez Brokerage Analysis

Global Brokerage Propertiest E(t) Sd(t) z Pr(gt|z|)

w_I 50000 58638 27314 -03162 07518

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 23: Social Network Analysis with sna package

Journal of Statistical Software 23

w_O 250000 195459 70713 07713 04405b_IO 180000 195459 62244 -02484 08039b_OI 170000 195459 62244 -04090 06825b_O 280000 234551 53349 08519 03943t 930000 879565 136124 03705 07110

Individual Properties (by Group)

Group ID 1w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 3 2 3 5 0 13 24874100 01931462 04058476 14190904[2] 0 0 1 0 0 1 -08042244 -11401201 -06073953 -11140168[3] 0 2 4 1 0 7 -08042244 01931462 09124690 -06073953[4] 0 1 1 3 0 5 -08042244 -04734869 -06073953 04058476

b_O t[1] -1186381 08682544[2] -1186381 -16099084[3] -1186381 -03708270[4] -1186381 -07838541

Group ID 2w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI b_O

[1] 0 3 0 0 2 5 NaN 003375725 -07426778 -07426778 -07530719[2] 0 6 0 0 10 16 NaN 152052825 -07426778 -07426778 24025111

t[1] -07838541[2] 14877951

Group ID 3w_I w_O b_IO b_OI b_O t w_I w_O b_IO b_OI

[1] 1 4 6 2 7 20 02929871 15264125 19257119 -01007739[2] 0 3 2 3 3 11 -08042244 08597794 -01007739 04058476[3] 1 2 1 2 3 9 02929871 01931462 -06073953 -01007739[4] 0 2 0 1 3 6 -08042244 01931462 -11140168 -06073953

b_O t[1] 30624213 231384939[2] 06345344 045522729[3] 06345344 004220016[4] 06345344 -057734055

Unlike the centrality routines described above brokerage produces a range of output inaddition to the raw brokerage scores The first table consists of the observed aggregatebrokerage scores by group for each of the brokerage roles (coordinator (w_I) itinerant broker(w_O) gatekeeper (b_IO) representative (b_OI) liaison (b_O) and combined (t)) along withthe corresponding expectations standard deviations associated z-scores and p-values underthe Gould-Fernandez random association model (to which the caveats noted earlier apply)The second set of tables similarly provides the observed brokerage scores and G-F z-scores

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 24: Social Network Analysis with sna package

24 Social Network Analysis with sna

for each individual organized by group It should be noted that very small groups cannotsupport certain brokerage roles and (likewise) certain brokerage roles can only be realizedwhen a sufficient number of groups are present z-scores are considered to be undefined whentheir associated role preconditions are unmet and are returned as NaNs

Graph-level indices

Like node-level indices graph-level indices are intended to provide succinct numerical sum-maries of structural properties in the latter case however the properties in question are thosepertaining to global structure Perhaps the simplest of the GLIs is density conventionallydefined as the fraction of potentially observable edges which are present within the graphDensity is computed within sna using the gden function which returns the density scores forone or more input graphs (taking into account directedness loops and missing data whereapplicable) Two more fundamental GLI classes are the reciprocity and transitivity measurescomputed within sna by grecip and gtrans respectively By default grecip returns thefraction of dyads which are symmetric (ie mutual or null) within the input graph(s) It canhowever be employed to return the fraction of non-null dyads which are symmetric or thefraction of reciprocated edges (the ldquoedgewiserdquo reciprocity) All of these correspond to slightlydifferent notions of reciprocity and are thus appropriate in somewhat different circumstancesLikewise gtrans provides several options for assessing structural transitivity Of particularimportance is the distinction between transitivity in its strong ((i j) (j k) isin E hArr (i k) isin Efor (i j k) isin V ) and weak ((i j) (j k) isin E rArr (i k) isin E) forms Intuitively weak transitivityconstitutes the notion embodied in the familiar saying that ldquoa friend of a friend is a friendrdquomdashwhere a two-path exists from i to k i should also be tied to k directly Strong transitivityis akin to a notion of ldquothird party supportrdquo direct ties occur if and only if supported byan associated two-path Weak transitivity is preferred for most purposes although strongtransitivity may be of interest as more strict indicator of local clustering By default gtransreturns the fraction of possible ordered triads which satisfy the appropriate condition (out ofthose at risk) although absolute counts of transitive triads can also be obtainedAnother classic family of indices which can be calculated using sna consists of the centralizationscores Following Freeman (1979) the centralization of graph G with respect to centralitymeasure c is given by

C(G) =|V |sumi=1

[(maxvisinV

c (vG))minus c (vi G)

] (1)

ie the total deviation from the maximum observed centrality score This can be usefullyrewritten as

C(G) = |V | [clowast(G)minus c(G)] (2)

where clowast(G) = maxvisinV c (vG) and c(G) = 1|V |sum|V |

i=1 c (vi G) are the maximum and meancentrality scores respectively The Freeman centralization index is thus equal to the differ-ence between the maximum and mean centrality scores scaled by the number of vertices itsdimensions are those of the underlying centrality measure In practice it is common to workwith the normalized centrality score obtained by dividing C(G) by its maximum across allgraphs of the same order as G This index is dimensionless and varies between 0 (for a graphin which all vertices have the same centrality scores2) and 1 (for a graph of maximum con-

2For instance when all vertices are automorphically equivalent

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 25: Social Network Analysis with sna package

Journal of Statistical Software 25

centration) Generally maximum centralization scores occur on the star graphs (ie K1n)3

although this is not always the casemdasheigenvector centralization for instance is maximizedfor the family K2 cup Nn Within sna both normalized and raw centralization scores may beobtained via the centralization function Arbitrary centrality functions may be passed tocentralization which are used to generate the underlying score vector in the normalizedcase the centrality function is asked to return the theoretical maximum deviation as wellThis is handled transparently for all included centrality functions within sna the mechanismmay also be employed with user-supplied functions provided that they supply the requiredarguments Examples are supplied in the sna manual

In addition to the above sna includes functions for GLIs such as Krackhardtrsquos (1994) mea-sures of informal organization These indicesmdashsupplied respectively by connectednessefficiency hierarchy and lubnessmdashdescribe the extent to which the structure of aninput graph approaches that of an outtree hierarchy can also be used to calculate hierarchybased on simple reciprocity as with grecip

The use of snarsquos GLI routines is straightforward calling with a graph or set thereof generallyresults in a vector of GLI scores (as in the following example) Note below the differencebetween the default (dyadic) and edgewise reciprocity the standard and ldquocensusrdquo variants ofgtrans and the various Krackhardt indices hierarchy defaults to one minus the dyadicreciprocity (as shown) but other options are available Similar selective behavior is employedelsewhere within sna (eg prestige)

Rgt g lt- rgraph(10 5 tprob = c(01 025 05 075 09))

Rgt gden(g)

[1] 006666667 031111111 054444444 072222222 093333333

Rgt grecip(g)

[1] 08666667 03777778 04888889 06666667 08666667

Rgt grecip(g measure = edgewise)

[1] 00000000 00000000 05306122 07692308 09285714

Rgt grecip(g) == 1 - hierarchy(g)

[1] TRUE TRUE TRUE TRUE TRUE

Rgt gtrans(g)

[1] 10000000 02957746 05047619 06809651 09326923

Rgt gtrans(g measure = weakcensus)

3Kn is the complete graph on n vertices with Knm denoting the complete bipartite graph on n and mvertices and Nn the null or empty graph on n vertices

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 26: Social Network Analysis with sna package

26 Social Network Analysis with sna

[1] 0 21 106 254 582

Rgt connectedness(g)

[1] 04666667 10000000 10000000 10000000 10000000

Rgt efficiency(g)

[1] 100000000 076543210 050617284 030864198 007407407

Rgt hierarchy(g measure = krackhardt)

[1] 10 02 00 00 00

Rgt lubness(g)

[1] 02 10 10 10 10

centralizationrsquos usage differs somewhat from the above as it acts as a wrapper for cen-trality routines (which must be specified along with any additional arguments) By defaultcentralization scores are computed only for a single graph Rrsquos apply (for arrays) or sapply(for lists) may be used to calculate scores for multiple graphs at once Both forms are illus-trated in the following example

Rgt centralization(g degree cmode = outdegree)

[1] 01728395

Rgt centralization(g betweenness)

[1] 0

Rgt apply(g 1 centralization degree cmode = outdegree)

[1] 017283951 027160494 038271605 006172840 007407407

Rgt apply(g 1 centralization betweenness)

[1] 0000000000 0135802469 0043467078 0021237507 0004151969

As noted above centralization is compatible with any node-level index function whichreturns its theoretical maximum deviation when called with tmaxdev = TRUE Consider forinstance the following

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 27: Social Network Analysis with sna package

Journal of Statistical Software 27

Rgt o2scent lt- function(dat tmaxdev = FALSE )

+ n lt- NROW(dat)

+ if(tmaxdev)

+ return((n-1) choose(n-1 2))

+ odeg lt- degree(dat cmode = outdegree)

+ choose(odeg 2)

+

Rgt apply(g 1 centralization o2scent)

[1] 002160494 020370370 054012346 008950617 014506173

Thus users can employ centralization ldquofor freerdquo when working with their own centralityroutines so long as they support the required calling argument

24 Connectivity and subgraph statistics

Connectivity in its most general sense refers to a range of properties relating to the abil-ity of one vertex to reach another via traversal of edges sna has a number of functionsto compute connectivity-related statistics and to identify associated graph features Ofthese componentdist is likely the most fundamental Given one or more input graphscomponentdist identifies all (maximal) components and provides associated informationon membership and size distributions Components may be selected based on standard no-tions of strong weak unilateral or recursive connectedness (although it should be notedthat unilaterally connected components may not be uniquely defined) The conveniencefunctions isconnected components and componentlargest can be used as front-endsto componentdist returning (respectively) the connectedness of the graph as a whole thenumber of observed components and the largest component in the graph The graph ofpairwise connected vertices (or reachability graph) is returned by reachability and pro-vides another means of assessing connectivity More precise information is contained in thegeodesic distances between vertices which can be computed (along with numbers of geodesicsbetween pairs) by geodist An example of how these concepts may be combined is providedby Fararo and Sunshinersquos (1964) structure statistics Let G = (VE) be a (possibly di-rected) graph of order N and let d(i j) be the geodesic distance from vertex i to vertexj in G The ldquostructure statisticsrdquo of G are then given by the series s0 sNminus1 wheresi = Nminus2

sumNj=1

sumNk=1 I(d(j k) le i) and I is the standard indicator function Intuitively si

is the expected fraction of G which lies within distance i of a randomly chosen vertex Assuch the structure statistics provide a parsimonious description of global connectivity (Theyare also of importance within biased net theory since analytical results for the expectationof these statistics exist for certain models See Fararo (1981 1983) Skvoretz et al (2004) forrelated results)

At least since Davis and Leinhardt (1972) social network analysts have recognized the im-portance of subgraph frequencies as an indicator of underlying structural tendencies Thistheory has been considerably enriched in recent decades (see eg Frank and Strauss 1986Pattison and Robins 2002) particularly with respect to the connection between edgewisedependence conditions and structural biases (see Wasserman and Robins (2005) for an ap-proachable introduction) It has also been recognized that constraints on properties of small

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 28: Social Network Analysis with sna package

28 Social Network Analysis with sna

subgraphs have substantial implications for global structure (see eg Faust (2007) and refer-ences) a connection which also motivates the use of such measures Most fundamental of thesubgraph statistics are those of the dyad census ie the respective counts of mutual asym-metric and null dyads The eponymous dyadcensus function returns these quantities (withmutuality returning only the number of mutual dyads) The triad census or frequencies ofeach triadic isomorphism class observed as induced subgraphs of G is similarly computed bytriadcensus In the undirected case there are four such classes versus 16 for the directedcase it is thus important to specify the directedness of onersquos data when employing this routine(or triadclassify which can be used to classify specific triads) Similar counts of pathsand cycles may be obtained using kpathcensus and kcyclecensus In addition to rawcounts co-membership and incidence statistics are given by vertex (where requested) Usersshould be aware that path and cycle census enumeration are NP-complete problems in thegeneral case and hence counts of longer paths or cycles are often impractical Short (or evenmid-length) cases can usually be calculated for sufficiently sparse graphs howeverInterpretation of subgraph census statistics is often aided by comparison with baseline models(Mayhew 1984) as in the case of conditional uniform graph (CUG) tests The p-value for aone-tailed CUG test of statistic t for graph G is given by Pr(t(H) ge t(G)) or Pr(t(H) le t(G))(for the upper and lower tests respectively) where H is a random graph drawn uniformlygiven conditioning statistics s(H) = s(G) sprime(H) = sprime(G) Conditioning on the orderof G is routine the number of edges dyad census and degree distribution are also widelyused A somewhat weaker family of null distributions are those which satisfy the conditionsEs(H) = s(G)Esprime(H) = sprime(G) for some s sprime These are equivalent to the graph distri-butions arising from the MLE for an exponential random graph model with sufficient statisticss sprime mdashthe homogeneous Bernoulli graph with parameter p equal to the density of G is atrivial example but more complex families are possible Within sna the cugtest wrapperfunction can be used to facilitate such comparisons Using the gliop routine cugtest canbe used to compare functions of statistics on graph pairs (eg difference in triangle counts)to those expected based on one or more simple null models (Compare to qaptest discussedin Section 26)

Example

To illustrate the use of the above measures we apply them to draws from a series of biasednet processes (See Section 27 for a discussion of the biased net model) We begin with alow-density Bernoulli graph model adding first reciprocity and then triad formation biasesAs can be seen varying the types of biases specified within the model alters the nature of theresulting structures and hence their subgraph and connectivity properties

Rgt g1 lt- rgbn(50 10 param = list(pi = 0 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g1) 2 mean)

Mut Asym Null100 1284 3116

Rgt apply(triadcensus(g1) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U4016 4848 350 552 580 960 194 186 184 072 012 008 008

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 29: Social Network Analysis with sna package

Journal of Statistical Software 29

120C 210 300030 000 000

Rgt g2 lt- rgbn(50 10 param = list(pi = 05 sigma = 0 rho = 0 d = 017))

Rgt apply(dyadcensus(g2) 2 mean)

Mut Asym Null884 926 2690

Rgt apply(triadcensus(g2) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U2546 2728 2336 186 240 422 826 1146 066 022 934 052 074120C 210 300134 228 060

Rgt g3 lt- rgbn(50 10 param = list(pi = 00 sigma = 025 rho = 0 d = 017))

Rgt apply(dyadcensus(g3) 2 mean)

Mut Asym Null894 2044 1562

Rgt apply(triadcensus(g3) 2 mean)

003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U466 2262 1006 482 500 1274 1078 902 972 256 326 388 360120C 210 300840 738 150

Rgt kpathcensus(g3[1] maxlen = 5 pathcomembership = bylength

+ dyadictabulation = bylength)$pathcount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v101 35 8 3 9 2 10 9 3 10 8 82 119 40 10 47 8 59 47 13 56 39 383 346 155 41 180 35 223 185 52 211 149 1534 791 457 130 504 114 601 527 163 572 425 4625 1351 964 303 1000 282 1143 1061 375 1104 884 990

Rgt kcyclecensus(g3[1] maxlen = 5

+ cyclecomembership = bylength)$cyclecount

Agg v1 v2 v3 v4 v5 v6 v7 v8 v9 v102 9 2 1 2 0 3 2 0 4 3 13 24 7 1 11 0 15 9 2 12 8 74 42 16 1 23 2 32 26 3 30 19 165 72 39 5 48 8 60 54 10 57 36 43

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 30: Social Network Analysis with sna package

30 Social Network Analysis with sna

Rgt componentdist(g3[1])

$membership[1] 1 1 1 1 1 1 1 1 1 1

$csize[1] 10

$cdist[1] 0 0 0 0 0 0 0 0 0 1

Rgt structurestatistics(g3[1])

0 1 2 3 4 5 6 7 8 9010 045 083 099 100 100 100 100 100 100

In addition to inspecting graph statistics directly we can also compare them using conditionaluniform graph tests Here for example we employ the absolute difference in reciprocities asa test statistic first testing against a CUG hypothesis conditioning only on order and secondtesting against a CUG hypothesis conditioning on both order and density

Rgt g4 lt- g1[12]

Rgt g4[2] lt- g2[1]

Rgt cug lt- cugtest(g4 gliop cmode = order GFUN = grecip OP = -

+ g1 = 1 g2 = 2)

Rgt summary(cug)

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0299p(f(rnd) lt= f(d)) 0708

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -033333331stQ -006666667Med 0Mean -00012888893rdQ 006666667Max 03555556

Rgt cug lt- cugtest(g4 gliop GFUN = grecip OP = - g1 = 1 g2 = 2)

Rgt summary(cug)

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 31: Social Network Analysis with sna package

Journal of Statistical Software 31

CUG Test Results

Estimated p-valuesp(f(rnd) gt= f(d)) 0967p(f(rnd) lt= f(d)) 0039

Test DiagnosticsTest Value (f(d)) 004444444Replications 1000Distribution Summary

Min -0066666671stQ 01555556Med 02222222Mean 022153333rdQ 02888889Max 05333333

A broader range of similar Monte Carlo tests can be employed by comparing observed statisticsagainst those arising from rgbn rguman or other included models

25 Position and role analysis

The study of roles and positions is a strong tradition within social network analysis (see egBreiger et al 1975 Burt 1976 Wasserman and Faust 1994 Doreian et al 2005) and remains apopular means of reducing the complexity of large structures Although many notions ofldquorolerdquoand ldquopositionrdquo have been proposed (see Doreian et al (2005) for an extensive treatment) themost widely used is without question structural equivalence For a simple graph G vertexv is said to be structurally equivalent to vertex vprime iff N(v) vprime = N(vprime) v (ie when vand vprime have the same alters) In the directed case this same general property (mutatismutandis) is required to hold for both in and outneighborhoods Structurally equivalentvertices are copies in a graph theoretic sense and are necessarily identical with respect to allstructural properties graph permutations which exchange only structural equivalent verticesare necessarily automorphisms As a true equivalence relation structural equivalence dividesa given graph into equivalence classes which are termed positions Since all vertices occupyinga given position connect to other positions in precisely the same way analyses of relationsamong positions (via their reduced form blockmodelmdashsee below) can often be used in placeof analyses of relations among vertices Where non-trivial structural equivalence is presentthis may result in an appreciable reduction in the size of the vertex set

In practice exact structural equivalence is fairly rare (isolates and pendants being two im-portant counterexamples) Nevertheless one may identify vertices which are approximatelystructurally equivalent in that their neighborhoods are ldquosimilarrdquo in some well-defined senseCommon means of assessing similarity between two vertices are product-moment correlationsEuclidean distances Hamming distances or gamma coefficients applied to their respectiverows and columns within the graph adjacency matrix Within sna sedist computes suchindices for all pairs of vertices on one or more input graphs Once these similaritiesdifferencesare calculated conventional multivariate data analysis procedures (eg hierarchical clusteringor multidimensional scaling) can be used to evaluate the extent of reduction which is possible

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 32: Social Network Analysis with sna package

32 Social Network Analysis with sna

This process is facilitated by the function equivclust which is essentially a joint front-endto Rrsquos built-in hierarchical clustering function (hclust) and various positional distance func-tions though it defaults to structural equivalence in particular Taking a set of user-specifiedgraphs as input equivclust computes the distances between all pairs of positions usingthe selected distance function and then performs a cluster analysis of the result The returnvalue is an object of class equivclust for which various secondary analysis methods exist

After clustering the next phase of a positional analysis is frequently blockmodeling Given aset of equivalence classes (in the form of an equivclust or hclust object or membershipvector) and one or more graphs blockmodel will form a blockmodel of the input graph(s)based on the classes in question using the specified block content type A blockmodel can bethought of as a generalized relational structure on a set of vertex classes The relationshipbetween the ith and jth class is said to be the i jth block whose content is referred to as itscorresponding block type (This terminology originates from the observation that permutingthe rows and columns of an adjacency matrix by vertex class can lead toldquoblocksrdquoof discerniblestructure in the permuted matrix For instance blocks among structural equivalence classesare comprised entirely of 1s or 0s neglecting the diagonal) Unless a vector of classes isspecified blockmodel forms its eponymous models by using Rrsquos cutree function to cut anequivalence by height or number of clusters (as specified) After forming clusters (classes)the input graphs are reordered by class and blockmodel reduction is applied Block typescurrently supported include quantitative forms such as density (mean value of the cells in theassociated adjacency matrix) row or column sums cell value descriptives and categoricaltypes (eg null 1-covered etc) Once a given reduction is performed the block structureitself can be analyzed andor expansion can be used to generate new graphs based on theimage structure

The primary use of blockmodel expansion (performed using blockmodelexpand) is in gener-ating simulated draws from a hypothesized blockmodel Expansion involves generating a newnetwork from a block image and thus depends on the block types from which the blockmodelis composed at present only density is supported For the density block type expansionis performed by interpreting the interclass density as an edge probability and by drawingrandom graphs from the Bernoulli parameter matrix formed by expanding the density modelThus repeated calls to blockmodelexpand can be used to generate a sample for Monte Carlonull hypothesis tests under an inhomogeneous Bernoulli graph model

Finally we note that positional analyses have traditionally been closely associated with rolealgebras (White 1963 Boyd 1969 Boorman and White 1976) which seek to model empiricalgraph structure via the composition of multiple simpler graphs Although snarsquos support forsuch analyses is currently limited a composition operator c is available The compositionGprimeprime of graphs G and Gprime on vertex set V is the graph on V such that (v vprime) isin E(Gprimeprime) iffthere exists a vertex vprimeprime such that (v vprimeprime) isin G and (vprimeprime vprime) isin Gprime (This is equivalent to thegraph formed by the boolean inner product of the graphsrsquo respective adjacency matrices) Itshould be noted that the composition of two graphs may have loops even where the originalgraphs do not thus diagonals should not be neglected when analyzing the results of graphcompositions

Example

To demonstrate the above routines we begin by creating an inhomogeneous Bernoulli digraph

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 33: Social Network Analysis with sna package

Journal of Statistical Software 33

with edge probabilities which are constant by sending vertex (This is equivalent to drawingfrom a p1 model containing only expansiveness and density effects) We then produce anequivalence clustering and associated blockmodel ultimately using the blockmodel to producea new graph As demonstrated new graphs produced in this way need not be of the sameorder as the original this is useful when simulating a hypothetical case in which individualactors may have entered or left a network without changing the underlying group structure

Rgt gp lt- sapply(runif(20 0 1) rep 20)

Rgt g lt- rgraph(20 tprob = gp)

Rgt eq lt- equivclust(g)

Rgt b lt- blockmodel(g eq h = 15)

Rgt ge lt- blockmodelexpand(b rep(2 length(b$rlabels)))

Rgt ge

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12][1] 0 0 1 1 0 0 1 0 0 1 1 1[2] 0 0 1 1 0 0 1 1 0 1 1 1[3] 0 0 0 0 1 1 1 1 0 0 0 0[4] 0 0 1 0 1 1 1 1 0 0 0 0[5] 0 0 0 0 0 0 0 0 1 1 0 0[6] 0 1 1 0 0 0 1 0 1 1 0 0[7] 0 0 1 1 0 1 0 1 1 1 0 1[8] 0 0 1 1 0 0 1 0 0 1 0 1[9] 0 0 0 1 1 1 0 1 0 0 0 0[10] 0 0 1 1 0 1 1 1 1 0 1 1[11] 0 0 0 0 0 0 1 1 0 0 0 1[12] 0 1 1 1 0 0 0 1 0 0 1 0

26 Exploratory edge set comparison

One important alternative to graph comparison using structural indices or subgraph statisticsis direct comparison of edge sets Within this general paradigm (see Hubert (1987) Krack-hardt (1987a 1988) Banks and Carley (1994) Butts and Carley (2005) Butts (2007) forexamples) comparison is based on establishing a matching between the edges of one graphand the edges of another leading to a measure of correspondence between the two In thesimplest case of multiple graphs on the same vertex set the matching in question may be be-tween those edges having the same (ordered) endpoints One natural correspondence measureis then the Hamming distance ie the number of edge changes needed to take one graph intothe other Another useful measure is Hubertrsquos Γ or the uncentered product-moment betweenthe two sets of edge variables For appropriate transformations of the original data Γ canbe interpreted as the correlation or covariance between the edge variable sets when entireadjacency matrices are compared in this way the result is known as the graph correlation orgraph covariance (respectively) For a directed graph pair GH for instance the latter isgiven by

cov(GH) =

sum(ij)

(AG

ij minus microG

)(AH

ij minus microH

)|V | (|V | minus 1)

(3)

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 34: Social Network Analysis with sna package

34 Social Network Analysis with sna

where AGAH are the respective adjacency matrices of G and H andmicroX = (|V | (|V | minus 1))minus1sum

(ij)AXij is the graph mean The graph variance is then cov(GG)

and the graph correlation ρ(GH) = cov(GH)radic

cov(GG)cov(HH) Within sna graphcorrelations and covariances can be obtained by using gcor and gcov respectively Hammingdistances for graph sets can be similarly obtained using hdist

The above situation becomes more complex when there is not a unique matching betweenedge sets (Butts and Carley 2005) provide a family of generalizations for these cases whichthey term structural distancescovariances These measures are based on maximizing thecorrespondence between edge sets under a set of permissible matchings this results in adecomposition of the total distancecovariance into that which is attributable to fixed aspectsof the structure (the structural component) versus that which depends on the (potentiallyvariable) matching (the ldquolabelingrdquo component) sna provides tools to obtain approximatestructural comparison measures using heuristic optimization methods to seek an optimalmatching The analogs to hdist in this regard are structdist and sdmat and those to gcorand gcov are gscor and gscov For optimal matching for arbitrary bivariate statistics ongraphs of identical order the laboptimize routines can also be employed Several methodsare supported of which the default (simulated annealing) seems to be the most effective inpractice

Given a set of distances among graphs analysis can then proceed using standard R toolsfor exploratory multivariate analysis such as cmdscale and hclust Functionality specific tosna includes centralgraph (which returns the graph minimizing the Hamming distance toall graphs in the input set) gclustboxstats (which shows distributions of graph statisticsbased on a hierarchical clustering of networks) gclustcentralgraph (which returns the cen-tral graphs for each element of a network clustering solution) gdistplotdiff (which plotsdistances between networks against differences in their properties) and gdistplotstats(which displays a metric MDS of networks with star-like figures showing graph-level covari-ates for each structure) Similarly network principal component analysis (Butts and Carley2001) can be trivially implemented by the application of eigen to a graph covariance or corre-lation matrix The ability to make use of standard tools for exploratory multivariate analysisis thus a salutary aspect of this approach

In addition to these general tools specific functions are available for OLS network regression(netlm) logistic network regression (netlogit) and network canonical correlation analysis(netcancor) These models assume multiple edge sets taken from the same set of vertices sothat there is a 11 mapping between edge variables across networks In this case the models inquestion are exactly analogous to their conventional (non-network) equivalents applied to theset of vectorized adjacency matrices (as with gvectorize) The primary difference betweenthe net versions of these analyses and standard routines is the availability of more specializeddiagnostic and testing mechanisms Of particular note is support for various QAP (Hubert1987) null hypotheses which test the observed correspondence between graphs against thedistribution of statistics arising from random reallocation of individuals to structural positions(ie permutation or relabeling) Simple QAP tests for bivariate network statistics (eggraph correlation) can also be performed using the stand-alone qaptest function SomeCUG null hypotheses are also available where conditioning on the entire observed structureis inappropriate

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 35: Social Network Analysis with sna package

Journal of Statistical Software 35

Example

We begin our demonstration of the sna edge set comparison routines with the simple caseof graph correlation The following illustrates the use of both simple graph correlations andstructural correlations Note that the unlabeled correlation between g2 and g3 here is1 (since the graphs are isomorphic) but the value returned by gscor may sometimes beless than 1 This is because gscor defaults to its heuristic annealing method when seekingthe structural correlation and this method does not always identify the global maximumExact results can be guaranteed using exhaustive search (method=exhaustive) but thecomputational expense of this method is prohibitive for graphs of moderate to large size seethe sna manual for additional options and details

Rgt g1 lt- rgraph(5)

Rgt g2 lt -rgraph(5)

Rgt g3 lt- rmperm(g2)

Rgt gcor(g1 g2)

[1] -01336306

Rgt gcor(g1 g3)

[1] 008908708

Rgt gcor(g2 g3)

[1] -04583333

Rgt gscor(g1 g2 reps = 1e5)

[1] 05345225

Rgt gscor(g1 g3 reps = 1e5)

[1] 05345225

Rgt gscor(g2 g3 reps = 1e5)

[1] 1

Going beyond graph correlations netlm allows us to relate multiple networks in an intuitivemanner

Rgt x lt- rgraph(20 4)

Rgt y lt- x[1] + 4 x[2] + 2 x[3]

Rgt nl lt- netlm(y x)

Rgt summary(nl)

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 36: Social Network Analysis with sna package

36 Social Network Analysis with sna

OLS Network Model

Residuals0 25 50 75 100

-2136676e-13 -6547650e-16 5123264e-16 1345843e-15 7075165e-14

CoefficientsEstimate Pr(lt=b) Pr(gt=b) Pr(gt=|b|)

(intercept) -1467115e-14 0000 1000 0000x1 1000000e+00 1000 0000 0000x2 4000000e+00 1000 0000 0000x3 2000000e+00 1000 0000 0000x4 -7553990e-16 0369 0631 0756

Residual standard error 1169e-14 on 375 degrees of freedomMultiple R-squared 1 Adjusted R-squared 1F-statistic 365e+30 on 4 and 375 degrees of freedom p-value 0

Test Diagnostics

Null Hypothesis qapReplications 1000Coefficient Distribution Summary

(intercept) x1 x2 x3 x4Min -26048970 -29689678 -35940257 -29888472 -156873431stQ -06779707 -06739579 -06980733 -07469624 -09732831Median -00841683 -00090468 00003289 -00116757 -04346029Mean -00256936 -00249585 -00161372 -00055288 -000801783rdQ 06930508 06393521 06352920 07064120 08601390Max 25434373 27231537 30464596 36938260 16294713

As noted earlier OLS network regression is problematic when the dependent graph is un-valued In this case netlogit may be preferred Its usage is directly analogous as in thefollowing example

Rgt x lt- rgraph(20 4)

Rgt yl lt- x[1] + 4 x[2] + 2 x[3]

Rgt yp lt- apply(yl c(1 2) function(a)1 (1 + exp(-a)))

Rgt y lt- rgraph(20 tprob = yp)

Rgt nl lt- netlogit(y x)

Rgt summary(nl)

Network Logit Model

Coefficients

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 37: Social Network Analysis with sna package

Journal of Statistical Software 37

Estimate Exp(b) Pr(lt=b) Pr(gt=b) Pr(gt=|b|)(intercept) 03077180 13603173 0680 0320 0503x1 09411361 25628914 0985 0015 0019x2 41473292 632648084 1000 0000 0000x3 18630911 64436238 1000 0000 0000x4 -01757242 08388493 0318 0682 0642

Goodness of Fit Statistics

Null deviance 5267919 on 380 degrees of freedomResidual deviance 1741572 on 375 degrees of freedomChi-Squared test of fit improvement

3526347 on 5 degrees of freedom p-value 0AIC 1841572 BIC 2038580Pseudo-R^2 Measures

(Dn-Dr)(Dn-Dr+dfn) 0481324(Dn-Dr)Dn 06694004

Contingency Table (predicted (rows) x actual (cols))

0 10 0 01 39 341

Total Fraction Correct 08973684Fraction Predicted 1s Correct 08973684Fraction Predicted 0s Correct NaNFalse Negative Rate 0False Positive Rate 1

Test Diagnostics

Null Hypothesis qapReplications 1000Distribution Summary

(intercept) x1 x2 x3 x4Min -1253710 -1160806 -1270806 -1295749 -12523001stQ -0215404 -0236393 -0229377 -0278976 -0250322Median 0078514 0022337 -0001591 -0020205 0001053Mean 0093105 0025854 0004520 -0017570 -00022623rdQ 0408121 0269836 0239821 0236166 0252251Max 1704128 1408468 1214650 1100783 1533500

It may be noted that in this case the model diagnostics indicate that the model is not terriblyeffective at predicting the absence of ties ndash this is largely a consequence of the high densityin the dependent graph (approximately 090) and is analogous to the usual challenge ofpredicting rare events with a logistic regression model Nevertheless we see that the modelrsquos

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 38: Social Network Analysis with sna package

38 Social Network Analysis with sna

parameter estimates are quite close to the true values and that the QAP test correctlyidentifies the irrelevant predictors

27 Network inference and process models

A final category of functions supplied by sna are those implementing various network infer-ence and process models Although the package still contains a legacy function for fittingsimple exponential random graph models via maximum pseudo-likelihood methods (pstar)it is strongly recommended that users employ the more modern tools of the ergm packagefor this purpose there are several other models however for which sna provides functional-ity not found elsewhere in statnet Perhaps foremost among these are tools for conductingnetwork inference ie estimation of the structure of an unknown network from noisy andorincomplete data (Butts 2003) Several classical methods of this type are implemented bythe consensus function which returns the estimate of an unknown graph from a series ofobserved graphs Methods supported include data analytic tools such as locally-aggregatedstructure (Krackhardt 1987a) and central graph (Banks and Carley 1994) estimators as wellas model-based approaches such as the consensus model of Batchelder and Romney (1988)The latter is based on the assumption that each data source has a base chance to ldquoknowrdquoand correctly generate the true value of an edge on which they report otherwise producing aldquoguessrdquo based on a (possibly biased) Bernoulli trial These competency and bias parametersare treated as source-level fixed effects and the latter may be omitted if desired estimationis by maximum likelihood A related class of models is supported by the bbnam family ofroutines which implements the methods of Butts (2003) The edge reporting process is inthis case parameterized in terms of false positive and false negative error rates which maybe fixed at the source level pooled or given as known Estimation is fully Bayesian witherror rate priors (where applicable) specified as beta distributions and graph priors specifiedin inhomogeneous Bernoulli form It should be noted that the likelihood of the reportingprocess assumed by the (Butts 2003) model can be reparameterized to match that of the(Batchelder and Romney 1988) model for cases in which the sum of false positive and falsenegative rates is less than 1 the two approaches differ primarily in their prior structure and inthe formerrsquos allowance for negatively informative reports (eg due to systematic deception)bbnam returns draws from the joint posterior distribution of the true graph and error param-eters (where applicable) using a multiple-chain Gibbs sampler The potential scale reductionmeasure of Gelman and Rubin (1992) (in the simplified form of Gelman et al 1995) can beapplied via potscaleredmcmc to assess convergence and bbnambf supports basic modelcomparison using approximate Bayes factors Draws from the model can be used directly orused to construct point estimates the helper function npostpred can be employed to easilyobtain posterior predictive graph properties from a set of posterior draws

Also supported by sna are the methods for estimating biased net parameters shown bySkvoretz et al (2004) The biased net model stems from early work by Rapoport whosought to model network structure via a hypothetical ldquotracingrdquo process This process may bedescribed loosely as follows One begins with a small ldquoseedrdquo set of vertices each member ofwhich is assumed to nominate (generate ties to) other members of the population with somefixed probability These members in turn may nominate new members of the population aswell as members who have already been reached Such nominations may be ldquobiasedrdquo in onefashion or another leading to a non-uniform growth process Specifically let eij be the ran-dom event that vertex i nominates vertex j when reached Then the conditional probability

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 39: Social Network Analysis with sna package

Journal of Statistical Software 39

of eij is given by Pr(eij |T ) = 1minus(1minusPr(Be)

)prodk

(1minusPr(Bk)

)sk(ijT ) where T is the currentstate of the trace Be is the Bernoulli event corresponding to the baseline probability of eij and the Bk are ldquobias eventsrdquo (of which sk have potentially occurred for the (i j) directeddyad) Bias events are taken to be independent Bernoulli trials given T such that eij isobserved with certainty if any bias event occurs The specification of a biased net modelthen involves defining the various bias events (which in turn influence the structure of thenetwork) The joint graph distribution under such a model is not in general known as suchestimation for model parameters (bias event probabilities) is currently heuristic bn currentlyimplements the maximum pseudo-likelihood estimators of Skvoretz et al (2004) as well as amethod of moments estimator based on the expected triad census (also proposed by Skvoretzet al) Heuristic goodness-of-fit statistics are provided as well as asymptotic goodness-of-fittests for dyad and triad statistics

While much attention in social network analysis is directed to structural properties per sewe may also consider models for the effect of structure on individual attributes The linearnetwork autocorrelation models (see Doreian (1990) and Cliff and Ord (1973) Anselin (1988)for the equivalent class of spatial autocorrelation models) constitute one important family ofprocesses which are often used for this purpose These models are of the form

y =

(wsum

i=1

θiWi

)y + Xβ + ε (4)

ε =

(zsum

i=1

ψiZi

)ε+ ν (5)

where y isin Rn is a vector of responses X isin Rntimesx is a covariate matrix W isin Rwtimesntimesn andZ isin Rztimesntimesn are interaction arrays β isin Rx θ isin Rw and ψ isin Rz are free parameters andν sim Norm(0 σ2) is a vector of iid disturbances Z and ψ combine to form a network movingaverage (MA) term which expresses the extent to which disturbances diffuse through thenetwork Analogously W and θ describe autocorrelation structure in the responses (net-work AR effects) Pragmatically the distinction between the two effect types is the latterrsquosinclusion of impact from neighborsrsquo covariate scoresmdashan AR term implies that each individ-ualrsquos response depends on that of their neighbors (including all covariate disturbance andhigher-order neighborhood effects) while an MA term implies that conditional dependencebetween responses is limited to deviations from the expectation It is thus possible to specifyAR and MA effects in isolation as well as jointly Within sna the lnam function performsmaximum likelihood estimation for network autocorrelation models To aid in identifyingappropriate weight matrices for use with lnam sna also supplies a function (nacf) for com-putation of sample network autocorrelation and autocovariance functions nacf can computecorrelationscovariances for partial and complete in- out- and combined neighborhoods ofvarious orders as well as autocorrelation indices such as Moranrsquos I (Moran 1950) and GearyrsquosC (Geary 1954) Prior inspection of network autocorrelation functions can aid in proposingweight matrices for subsequent evaluation (in analogy to similar heuristics within the timeseries literature see eg Brockwell and Davis 1991) Functions such as sedist can also beused to construct matrices based on other structural properties (eg structural equivalence)see Leenders (2002) for a useful discussion

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 40: Social Network Analysis with sna package

40 Social Network Analysis with sna

Example

To demonstrate the use of snarsquos network inference procedures we begin by creating a fictitiousdata set in which we are given reports regarding the state of the network (g) from 20 error-prone informants As a fairly realistic test case we take the informantsrsquo false positive rates(ep) to be beta distributed with a mean of 0038 and their false negative rates (em) to belikewise beta distributed with a mean of 0375 (about ten times higher) We then subject thisdata to bbnam employing some fairly generic priors Specifically we employ an uninformativenetwork prior (specified by pnet) and identical beta(2 11) priors for all error rates Thesummary function for the returned network describes the resulting posterior properties alongwith various diagnostics

Rgt g lt- rgraph(20)

Rgt ep lt- rbeta(20 1 25)

Rgt em lt- rbeta(20 15 25)

Rgt dat lt- array(dim = c(20 20 20))

Rgt for(i in 120)

+ dat[i] lt- rgraph(20 1 tprob = (g (1 - em[i]) + (1 - g) ep[i]))

Rgt pnet lt- matrix(05 ncol = 20 nrow = 20)

Rgt pem lt- matrix(nrow = 20 ncol = 2)

Rgt pem[1] lt- 2

Rgt pem[2] lt- 11

Rgt pep lt- matrix(nrow = 20 ncol = 2)

Rgt pep[1] lt- 2

Rgt pep[2] lt- 11

Rgt b lt- bbnam(dat model = actor nprior = pnet emprior = pem

+ epprior = pep burntime = 300 draws = 100)

Rgt summary(b)

Butts Hierarchical Bayes Model for Network EstimationInformant Accuracy

Multiple Error Probability Model

Marginal Posterior Network Distribution

a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15a1 000 000 000 100 100 000 100 100 000 000 100 100 000 000 000a2 000 000 100 100 100 000 000 100 100 100 000 000 000 000 100a3 000 100 000 100 100 100 000 000 000 000 100 000 000 100 100a4 001 100 100 000 000 000 100 100 000 100 000 000 000 000 100a5 100 100 100 100 000 100 000 000 100 000 100 100 100 100 000a6 000 000 100 000 000 000 100 000 100 100 018 100 000 000 100a7 100 100 000 100 000 000 000 100 000 000 000 100 000 000 100a8 000 100 100 100 100 100 000 000 100 000 000 100 000 100 000a9 000 000 100 000 100 000 100 100 000 100 000 000 000 100 100a10 000 000 000 000 000 000 100 100 100 000 100 000 000 100 000a11 000 000 100 100 100 000 000 000 000 000 000 100 100 000 100a12 100 100 000 000 100 000 000 000 000 000 100 000 000 000 000

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 41: Social Network Analysis with sna package

Journal of Statistical Software 41

a13 000 000 000 100 100 100 100 100 000 000 100 100 000 000 000a14 100 000 000 000 000 100 000 000 000 000 000 100 000 000 000a15 100 100 000 100 000 000 100 000 100 000 000 000 000 000 000a16 000 100 100 000 100 100 000 100 000 000 000 000 000 000 100a17 100 000 100 000 000 100 000 000 100 000 000 000 000 100 000a18 100 000 100 000 000 000 000 100 000 000 100 100 000 100 100a19 000 000 100 000 100 100 000 100 000 000 100 100 100 100 100a20 000 100 000 100 100 000 000 000 000 000 100 000 000 000 000

a16 a17 a18 a19 a20a1 100 100 100 000 000a2 100 000 000 100 100a3 000 000 100 000 100a4 000 100 000 100 100a5 100 100 000 000 100a6 000 000 000 100 000a7 100 000 000 000 000a8 000 000 100 000 100a9 100 100 100 100 000a10 000 100 100 100 000a11 100 100 000 100 100a12 100 000 100 100 000a13 000 000 100 000 100a14 000 000 000 000 000a15 100 000 100 000 100a16 000 000 100 000 000a17 000 000 100 000 100a18 000 000 000 100 000a19 000 000 000 000 100a20 100 100 100 100 000

Marginal Posterior Global Error Distribution

e^- e^+Min 01443951 000042381stQ 03126975 00167584Median 03678306 00294646Mean 03783663 004936883rdQ 04423027 00574099Max 06909116 02262239

Marginal Posterior Error Distribution (by observer)

Probability of False Negatives (e^-)

Min 1stQ Median Mean 3rdQ Maxo1 03132 03599 03798 03864 04073 05071o2 02613 02944 03115 03187 03419 03995

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 42: Social Network Analysis with sna package

42 Social Network Analysis with sna

o3 04148 04724 04937 04948 05213 05649o4 02511 03075 03246 03257 03448 04085o5 01814 02417 02681 02678 02887 03434o6 02881 03531 03761 03766 04046 04488o7 02395 03028 03211 03244 03449 03951o8 01444 02011 02209 02212 02398 02922o9 03708 04358 04529 04578 04787 05503o10 03210 03724 03967 03982 04259 04751o11 03064 03847 04093 04109 04371 05007o12 02367 03132 03354 03349 03607 04455o13 03534 04144 04386 04382 04600 05337o14 02438 02985 03235 03229 03452 04184o15 02585 03299 03510 03519 03706 04704o16 02502 03298 03481 03509 03699 04268o17 01759 02273 02488 02503 02668 03372o18 03959 04468 04646 04710 04922 05812o19 04944 05736 06007 05975 06189 06909o20 03737 04433 04631 04671 04916 05607

Probability of False Positives (e^+)

Min 1stQ Median Mean 3rdQ Maxo1 00195433 00397919 00490722 00510872 00585109 01069030o2 01067928 01395067 01555455 01569023 01714084 02262239o3 00084268 00165518 00224858 00236948 00293221 00551761o4 00712109 01047058 01137249 01180402 01320136 01723854o5 00034994 00103378 00150617 00169536 00212638 00468961o6 00004238 00040509 00068522 00082363 00098606 00279960o7 00061597 00136434 00192100 00207973 00266508 00484633o8 00072124 00204896 00260316 00282562 00350608 00593586o9 00804463 01092987 01213202 01246571 01372326 01935724o10 00065188 00135991 00194675 00223006 00278075 00594150o11 00173415 00358252 00445098 00464278 00551955 00828446o12 00185894 00416346 00499440 00516976 00573815 01202316o13 00029818 00108936 00155202 00170049 00209790 00401566o14 00044849 00108034 00166631 00178764 00226294 00486647o15 00084143 00199868 00271149 00290795 00355966 00606914o16 00009067 00078736 00124531 00139218 00187929 00455700o17 00066611 00216195 00273388 00290307 00346110 00691573o18 00846863 01344580 01508170 01485688 01628176 02036186o19 00037608 00117982 00171030 00179751 00225298 00466090o20 00214701 00348032 00433397 00448676 00516594 00936080

MCMC Diagnostics

Replicate Chains 5Burn Time 300

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 43: Social Network Analysis with sna package

Journal of Statistical Software 43

Draws per Chain 20 Total Draws 100Potential Scale Reduction (GampRs sqrt(Rhat))

Max 1003116Med 09992194IQR 00004545115

Rgt cor(em apply(b$em 2 median))

[1] 09187894

Rgt cor(ep apply(b$ep 2 median))

[1] 0971649

Rgt mean(apply(b$net c(2 3) median) == g)

[1] 1

Although the priors do not reflect the true error distribution bbnam still does a good job ofpinning down the error rates (and the network itself which is actually somewhat easier toestimate in many cases) In practice the bbnam model is fairly robust to choice of priorsso long as the error rate priors do not put a large degree of mass on the ldquoperverserdquo regionfor which em + ep gt 1 Multiple actors whose error rates satisfy this condition with highprobability in the posterior or posterior graph distributions which are strongly multimodalcan be indicators either of excessively ldquoperverserdquo priors or of extreme disagreement amonginformants (eg as would result from systematic deception) Either possibility warrants are-examination of both the userrsquos modeling assumptions and of the data itself

Having obtained a Bayesian point estimate we can also evaluate the performance of variousclassical network estimators The consensus function allows us to calculate several includingthe union and intersection LAS central graph and Romney-Batchelder model

Rgt mean(consensus(dat method = LASintersection) == g)

[1] 07725

Rgt mean(consensus(dat method = LASunion) == g)

[1] 0905

Rgt mean(consensus(dat method = centralgraph) == g)

[1] 09575

Rgt mean(consensus(dat method = romneybatchelder) == g)

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 44: Social Network Analysis with sna package

44 Social Network Analysis with sna

Estimated competency scores[1] 05384305 05152780 04482434 05333154 07128820 05920044 06278100[8] 07532642 03863239 05535066 05120474 06065419 05147395 06447705[15] 06046575 06121955 07115359 03448647 03351731 04501279Estimated bias parameters[1] 013137940 035170786 006013660 028684742 009962490 004767398[7] 008915006 015302781 022559772 007431412 011489655 015412247[13] 005894590 008052288 009550557 006195760 014675686 024625026[19] 004302486 010195838[1] 1

For this scenario the intersection LAS is an especially poor choice (since it exacerbates theeffects of false negatives) the central graph and Romney-Batchelder models are far betterThe performance of the central graph will degrade quickly however when either false positiveor false negative rates approach or exceed 05 The two likelihood-based methods (bbnam andRomney-Batchelder) can still be quite robust in such such cases provided that total errorrates (false positive plus false negative) are less than 1

As a final example of snarsquos model-based methods we here illustrate the use of lnam to fit alinear network autocorrelation model We show in this case an example which includes bothAR and MA components estimating both effects simultaneously (This example requires thenumDeriv package)

Rgt w1 lt- rgraph(50)

Rgt w2 lt- rgraph(50)

Rgt x lt- matrix(rnorm(50 5) 50 5)

Rgt r1 lt- 02

Rgt r2 lt- 03

Rgt sigma lt- 01

Rgt beta lt- rnorm(5)

Rgt nu lt- rnorm(50 0 sigma)

Rgt e lt- qrsolve(diag(50) - r2 w2 nu)

Rgt y lt- qrsolve(diag(50) - r1 w1 x beta + e)

Rgt fit lt- lnam(y x w1 w2)

Rgt summary(fit)

Calllnam(y = y x = x W1 = w1 W2 = w2)

ResidualsMin 1Q Median 3Q Max

-052052 -018305 001156 015557 062082

CoefficientsEstimate Std Error Z value Pr(gt|z|)

X1 -0331259 0010831 -3058 lt2e-16 X2 0535608 0009448 5669 lt2e-16 X3 -0685068 0007138 -9598 lt2e-16

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 45: Social Network Analysis with sna package

Journal of Statistical Software 45

X4 0691812 0008417 8219 lt2e-16 X5 0016491 0007890 209 00366 rho11 0194935 0002575 7571 lt2e-16 rho21 0307491 0021167 1453 lt2e-16 ---Signif codes 0 ` 0001 ` 001 ` 005 ` 01 ` 1

Estimate Std ErrorSigma 009597 922e-05

Goodness-of-FitResidual standard error 02913 on 43 degrees of freedom (wo Sigma)Multiple R-Squared 096 Adjusted R-Squared 09534Model log likelihood 5847 on 42 degrees of freedom (wSigma)AIC -1009 BIC -8565

Null model meanstdNull log likelihood -8248 on 48 degrees of freedomAIC 1690 BIC 1728AIC difference (model versus null) 2699Heuristic Log Bayes Factor (model versus null) 2584

In addition to the above diagnostics plot(fit) produces residual plots and a ldquonet influenceplotrdquo which depicts the total influence of each vertex on each other vertex in network form(i j) pairs for which irsquos net influence on j is estimated to be at least two standard deviationsgreater than the mean net influence are designated by green edges while corresponding pairsfor which irsquos net influence on j is estimated to be at least two standard deviations lower (iemore negative) than the mean net influence are designated by red edges Sample output forthe above example is provided in Figure 6

3 Closing comments

The methodological literature on social network analysis is large and growing and no onepackage can hope to implement all known measures and techniques sna provides a collectionof routines which is diverse and which covers many of the methods currently seeing wideuse within the field Together with the other packages of the statnet ensemble it is hopedthat the inclusion of such tools within a freely available widely used statistical computingplatform will help further the integration of network analytic methods with more conventionalapproaches to modern data analysis

Acknowledgments

The author would like to thank the many persons who have contributed to sna in some fashionincluding (but not limited to) David Barron Matthijs den Besten Alex Montgomery DavidKrackhardt David Dekker Kurt Hornik Ulrik Brandes Mark S Handcock and the statnet

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 46: Social Network Analysis with sna package

46 Social Network Analysis with sna

minus3 minus2 minus1 0 1 2

minus3minus2

minus10

12

Fitted vs Observed Values

y

y

minus3 minus2 minus1 0 1 2

minus02

minus01

00

01

02

Fitted Values vs Estimated Disturbances

y

ν

minus2 minus1 0 1 2

minus04

minus02

00

02

04

06

Normal QminusQ Residual Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Net Influence Plot

Figure 6 Plot method output for lnam

team This paper is based upon work supported by National Institutes of Health award 5R01 DA012831-05 subaward 918197 and by NSF award IIS-0331707

References

Anselin L (1988) Spatial Econometrics Methods and Models Kluwer Norwell MA

Banks D Carley KM (1994) ldquoMetric Inference for Social Networksrdquo Journal of Classification11(1) 121ndash149

Batagelj V Mrvar A (2007) Pajek Package for Large Network Analysis University ofLjubljana Slovenia URL httpvladofmfuni-ljsipubnetworkspajek

Batchelder WH Romney AK (1988) ldquoTest Theory Without an Answer Keyrdquo Psychometrika53(1) 71ndash92

Bonacich P (1987) ldquoPower and Centrality A Family of Measuresrdquo American Journal ofSociology 92 1170ndash1182

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 47: Social Network Analysis with sna package

Journal of Statistical Software 47

Boorman SA White HC (1976) ldquoSocial Structure from Multiple Networks II Role Struc-turesrdquo American Journal of Sociology 81 1384ndash1446

Borgatti SP (2007) NetDraw Network Visualization Software Version 2067 URL httpwwwanalytictechcom

Borgatti SP Carley K Krackhardt D (2006) ldquoRobustness of Centrality Measures UnderConditions of Imperfect Datardquo Social Networks 28 124ndash136

Borgatti SP Everett MG Freeman LC (1999) UCINET 60 for Windows Software forSocial Network Analysis Analytic Technologies Natick URL httpwwwanalytictechcom

Boyd JP (1969) ldquoThe Algebra of Group Kinshiprdquo Journal of Mathematical Psychology 6139ndash167

Brandes U Erlebach T (eds) (2005) Network Analysis Methodological FoundationsSpringer-Verlag Berlin

Brandes U Kenis P Wagner D (2003) ldquoCommunicating Centrality in Policy Network Draw-ingsrdquo IEEE Transactions on Visualization and Computer Graphics 9(2) 241ndash253

Breiger RL Boorman SA Arabie P (1975) ldquoAn Algorithm for Clustering Relational Data withApplications to Social Network Analysis and Comparison with Multidimensional ScalingrdquoJournal of Mathematical Psychology 12 323ndash383

Brockwell PJ Davis RA (1991) Time Series Theory and Methods Springer-Verlag NewYork second edition

Burt RS (1976) ldquoPositions In Networksrdquo Social Forces 55 93ndash122

Burt RS (1991) STRUCTURE Columbia University Software package version 42 URLhttpfacultychicagogsbeduronaldburtteaching

Butts CT (2003) ldquoNetwork Inference Error and Informant (In)Accuracy A Bayesian Ap-proachrdquo Social Networks 25(2) 103ndash140

Butts CT (2007) ldquoPermutation Models for Relational Datardquo Sociological Methodology 37257ndash281

Butts CT Carley KM (2001) ldquoMultivariate Methods for Interstructural Analysisrdquo CASOSworking paper Center for the Computational Analysis of Social and Organization SystemsCarnegie Mellon University

Butts CT Carley KM (2005) ldquoSome Simple Algorithms for Structural Comparisonrdquo Com-putational and Mathematical Organization Theory 11(4) 291ndash305

Butts CT Handcock MS Hunter DR (2007) network Classes for Relational Data StatnetProject httpstatnetprojectorg Seattle WA R package version 13 URL httpCRANR-projectorgpackage=network

Butts CT Pixley JE (2004) ldquoA Structural Approach to the Representation of Life HistoryDatardquo Journal of Mathematical Sociology 28(2) 81ndash124

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 48: Social Network Analysis with sna package

48 Social Network Analysis with sna

Cliff AD Ord JK (1973) Spatial Autocorrelation Pion London

Davis JA Leinhardt S (1972) ldquoThe Structure of Positive Interpersonal Relations in SmallGroupsrdquo In J Berger (ed) ldquoSociological Theories in Progress Volume 2rdquo pp 218ndash251Houghton Mifflin Boston

Dodds PS Watts DJ Sabel CF (2003) ldquoInformation Exchange and the Robustness of Organi-zational Networksrdquo Proceedings of the National Academy of Sciences 100(2) 12516ndash12521

Doreian P (1990) ldquoNetwork Autocorrelation Models Problems and Prospectsrdquo In IDAGriffith (ed) ldquoSpatial Statistics Past Present and Futurerdquo pp 369ndash389 Institute ofMathematical Geography Ann Arbor

Doreian P Batagelj V Ferlioj A (2005) Generalized Blockmodeling Cambridge UniversityPress Cambridge

Fararo TJ (1981) ldquoBiased Networks and Social Structure Theorems Part Irdquo Social Networks3 137ndash159

Fararo TJ (1983) ldquoBiased Networks and the Strength of Weak Tiesrdquo Social Networks 51ndash11

Fararo TJ Sunshine MH (1964) A Study of a Biased Friendship Net Youth DevelopmentCenter Syracuse NY

Faust K (2007) ldquoVery Local Structure in Social Networksrdquo Sociological Methodology 37209ndash256

Frank O Strauss D (1986) ldquoMarkov Graphsrdquo Journal of the American Statistical Association81(395) 832ndash842

Freeman LC (1979) ldquoCentrality in Social Networks Conceptual Clarificationrdquo Social Net-works 1(3) 223ndash258

Freeman LC (2004) The Development of Social Network Analysis A Study in the Sociologyof Science Empirical Press Vancouver

Fruchterman TMJ Reingold EM (1991) ldquoGraph Drawing by Force-directed PlacementrdquoSoftware ndash Practice and Experience 21(11) 1129ndash1164

Geary R (1954) ldquoThe Contiguity Ratio and Spatial Mappingrdquo The Incorporated Statistician5 115ndash145

Gelman A Carlin JB Stern HS Rubin DB (1995) Bayesian Data Analysis Chapman ampHallCRC London

Gelman A Rubin DB (1992) ldquoInference from Iterative Simulation Using Multiple SequencesrdquoStatistical Science 7 457ndash511

Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B GautierL Ge Y Gentry J Hornik K Hothorn T Huber W Iacus S Irizarry R Leisch F Li CMaechler M Rossini AJ Sawitzki G Smith C Smyth G Tierney L Yang JYH Zhang

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 49: Social Network Analysis with sna package

Journal of Statistical Software 49

J (2004) ldquoBioconductor Open Software Development for Computational Biology andBioinformaticsrdquo Genome Biology 5 R80 URL httpgenomebiologycom2004510R80

Gilks WR Richardson S Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo inPractice Chapman amp HallCRC New York

Gould R Fernandez R (1989) ldquoStructures of Mediation A Formal Approach to Brokeragein Transaction Networksrdquo Sociological Methodology 19 89ndash126

Hall KM (1970) ldquoAn r-dimensional Quadratic Placement Algorithmrdquo Management Science17 219ndash229

Handcock MS Hunter DR Butts CT Goodreau SM Morris M (2003) statnet Soft-ware Tools for the Statistical Modeling of Network Data Statnet Project httpstatnetprojectorg Seattle WA R package version 20 URL httpCRANR-projectorgpackage=statnet

Holland PW Leinhardt S (1970) ldquoA Method for Detecting Structure in Sociometric DatardquoAmerican Journal of Sociology 70 492ndash513

Hubert LJ (1987) Assignment Methods in Combinatorial Data Analysis Marcel DekkerNew York

Huisman M van Duijn MAJ (2003) ldquoStOCNET Software for the Statistical Analysis ofSocial Networksrdquo Connections 25(1) 7ndash26

Ingram P Roberts PW (2000) ldquoFriendships Among Competitors in the Sydney Hotel Indus-tryrdquo American Journal of Sociology 106 387ndash423

Kamada T Kawai S (1989) ldquoAn Algorithm for Drawing General Undirected Graphsrdquo Infor-mation Processing Letters 31(1) 7ndash15

Koenker R Ng P (2007) SparseM Sparse Linear Algebra R package version 073 URLhttpCRANR-projectorgpackage=SparseM

Krackhardt D (1987a) ldquoCognitive Social Structuresrdquo Social Networks 9(2) 109ndash134

Krackhardt D (1987b) ldquoQAP Partialling as a Test of Spuriousnessrdquo Social Networks 9(2)171ndash186

Krackhardt D (1988) ldquoPredicting with Networks Nonparametric Multiple Regression Anal-yses of Dyadic Datardquo Social Networks 10 359ndash382

Krackhardt D (1994) ldquoGraph Theoretical Dimensions of Informal Organizationsrdquo In KM Car-ley MJ Prietula (eds) ldquoComputational Organizational Theoryrdquo pp 88ndash111 LawrenceErlbaum Associates Hillsdale NJ

Krackhardt D Blythe J McGrath C (1994) ldquoKrackPlot 30 An Improved Network DrawingProgramrdquo Connections 17(2) 53ndash55

Leenders TTAJ (2002) ldquoModeling Social Influence Through Network Autocorrelation Con-structing the Weight Matrixrdquo Social Networks 24(1) 21ndash47

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 50: Social Network Analysis with sna package

50 Social Network Analysis with sna

Marsden PV (2005) ldquoRecent Developments in Network Measurementrdquo In PJ CarringtonJ Scott S Wasserman (eds) ldquoModels and Methods in Social Network Analysisrdquo chapter 2pp 8ndash30 Cambridge University Press Cambridge

Mayhew BH (1984) ldquoBaseline Models of Sociological Phenomenardquo Journal of MathematicalSociology 9 259ndash281

Moran PAP (1950) ldquoNotes on Continuous Stochastic Phenomenardquo Biometrika 37 17ndash23

Pattison P Robins GL (2002) ldquoNeighbourhood-Based Models for Social Networksrdquo Socio-logical Methodology 32 301ndash337

Rapoport A (1957) ldquoA Contribution to the Theory of Random and Biased Netsrdquo Bulletinof Mathematical Biophysics 15 523ndash533

R Development Core Team (2007) R A Language and Environment for Statistical Com-puting R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0Version 261 URL httpwwwR-projectorg

Richards WD Seary AJ (2006) MultiNet for Windows Version 475 URL httpwwwsfuca~richardsMultinetPagesmultinethtm

Romney AK Weller SC Batchelder WH (1986) ldquoCulture as Consensus A Theory of Cultureand Informant Accuracyrdquo American Anthropologist 88(2) 313ndash338

Sabidussi G (1966) ldquoThe Centrality Index of a Graphrdquo Psychometrika 31 581ndash603

Shimbel A (1953) ldquoStructural Parameters of Communication Networksrdquo Bulletin of Mathe-matical Biophysics 15 501ndash507

Skvoretz J Fararo TJ Agneessens F (2004) ldquoAdvances in Biased Net Theory DefinitionsDerivations and Estimationsrdquo Social Networks 26 113ndash139

Snijders TAB (2001) SIENA Simulation Investigation for Empirical Network AnalysisVersion 31 URL httpstatgammarugnlsnijderssienahtml

Snijders TAB (2002) ldquoMarkov Chain Monte Carlo Estimation of Exponential Random GraphModelsrdquo Journal of Social Structure 3(2)

Stallman RM (2002) Free Software Free Society Selected Essays of Richard M StallmanGNU PressFree Software Foundation Boston MA

Stephenson K Zelen M (1989) ldquoRethinking Centrality Methods and Applicationsrdquo SocialNetworks 11 1ndash37

Stokman FN Van Veen FJAM (1981) GRADAP Graph Definition and Analysis Pack-age Userrsquos Manual Interuniversity Project Group GRADAP University of Amsterdam-Groningen-Nijmegen URL httpwwwassesscom

Wasserman S Robins G (2005) ldquoAn Introduction to Random Graphs Dependence Graphsand plowastrdquo In PJ Carrington J Scott S Wasserman (eds) ldquoModels and Methods in SocialNetwork Analysisrdquo chapter 10 pp 192ndash214 Cambridge University Press Cambridge

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments
Page 51: Social Network Analysis with sna package

Journal of Statistical Software 51

Wasserman SS Faust K (1994) Social Network Analysis Methods and Applications Struc-tural Analysis in the Social Sciences Cambridge University Press Cambridge

Watts DJ Strogatz SH (1998) ldquoCollective Dynamics of lsquoSmall-Worldrsquo Networksrdquo Nature393 440ndash442

West DB (1996) Introduction to Graph Theory Prentice Hall Upper Saddle River NJ

White HC (1963) An Anatomy of Kinship Englewood Cliffs NJ Prentice Hall

Affiliation

Carter T ButtsDepartment of Sociology and Institute for Mathematical Behavioral SciencesUniversity of California IrvineIrvine CA 92697-5100 United States of AmericaE-mail buttscucieduURL httpwwwfacultyucieduprofilecfmfaculty_id=5057

Journal of Statistical Software httpwwwjstatsoftorgpublished by the American Statistical Association httpwwwamstatorg

Volume 24 Issue 6 Submitted 2007-06-01February 2008 Accepted 2007-12-25

  • Introduction and overview
    • Package history
    • sna and statnet
    • Functionality
    • Terminology and data representation
      • Importing relational data into R
          • Package highlights
            • Random graph generation
              • Example
                • Visualization and data manipulation
                  • Neighborhood and ego net functions
                  • Visualization
                    • Descriptive indices
                      • Node-level indices
                      • Graph-level indices
                        • Connectivity and subgraph statistics
                          • Example
                            • Position and role analysis
                              • Example
                                • Exploratory edge set comparison
                                  • Example
                                    • Network inference and process models
                                      • Example
                                          • Closing comments

Recommended