Social Network Analysis
Fundamental Concepts in Social Network Analysis (Part 2)
Katarina Stanoevska-Slabeva, Miriam Meckel, Thomas Plotkowiak
© Thomas Plotkowiak 2010
Agenda
1. Intro
2. Measuring Networks – Embedding Measures (Ties)
– Positions and Roles (Nodes)
– Group Concepts
3. Network Mechanisms
4. Network Theories
© Thomas Plotkowiak 2010
Introduction Knoke information exchange network
In 1978, Knoke & Wood collected data from workers at 95 organizations in Indianapolis. Respondents indicated with which other organizations their own organization had any of 13 different types of relationships.
The exchange of information among ten organizations that were involved in the local political economy of social welfare services in a Midwestern city.
2. Network Measures 2.1 Network Measures for Actors Embedding Measures
© Thomas Plotkowiak 2010
Embedding Measures
• Reciprocity (Dyad Census)
• Transitivity (Triad Census)
• Clustering
• Density
• Group-external and group-internal Ties
• Other Network Mechanisms
© Thomas Plotkowiak 2010
Reciprocity
• With symmetric data two actors are either connected or not.
• With directed data there are four possible dyadic relationships: – A and B are not connected
– A sends to B
– B sends to A
– A and B send to each other.
© Thomas Plotkowiak 2010
Reciprocity II
• What is the reciprocity in this network? – Answer 1: % of pairs that have reciprocated ties / all possible pairs
• AB of {AB,AC,BC} = 0.33
– Answer2: % of pairs that have reciprocated ties / existing pairs • AB of {AB,BC} = 0.5
– Answer 3: % directed ties / all directed ties • {AB,BA} of {AB, BA, AC, CA, BC, CA} = 0.33
© Thomas Plotkowiak 2010
Transitivity
• With undirected there are four possible types of triadic relations – No ties
– One tie
– Two Ties
– Three Ties
• The count of the relative prevalence of these four types of relations is called "triad census“. A population can be characterized by: – "isolation"
– "couples only"
– "structural holes" (one actors is connected to two others, who are not connected to each other)
– or "clusters"
© Thomas Plotkowiak 2010
Transitivity II Directed Networks
M-A-N number: M # of mutual positive dyads A #asymmetric dyads N #of null dyads
D =Down, U = Up, C = Cyclic, T= Transitive
© Thomas Plotkowiak 2010
Triad Census Models
Balance Model with Two Cliques (Heider Balance)
Triads either 300 or 102
Linear Hierarchy Model Every triad is 030T
Ranked Clusters Model (Hierarchy of Cliques) Triads: 300, 102, 003, 120D, 120U, 030T, 021D, 021U
(all) (all)
(all) (all)
(all)
© Thomas Plotkowiak 2010
Example Directed information exchange network
The exchange of information among ten organizations that were involved in the local political economy of social welfare services in a Midwestern city.
1
3 5
2 4
7 6
8
9
10
© Thomas Plotkowiak 2010
Transitivity III
• How to measure transitivity? – A) Divide the number of found transitive triads by the total number of
possible triplets (for 3 nodes there are 6 possibilities)
– B) Norm the number of transitive triads by the number of cases where a single link could complete the triad. Norm {AB, BC, AC} by {AB, BC, anything) (for 3 nodes there are 4 possibilities)
A
B C
1
2
3
© Thomas Plotkowiak 2010
Transitivity IV
146/720
146/217
© Thomas Plotkowiak 2010
Clustering
Most actors live in local neighborhoods and are connected to one another. A large proportion of the total number of ties is highly "clustered" into local neighborhoods.
VS.
© Thomas Plotkowiak 2010
Global clustering coefficient
Closed triplet Triplet
© Thomas Plotkowiak 2010
Average Local Clustering coefficient
A measure to calculate how clustered the graph is we examine the local neighborhood of an actor (all actors who are directly connected to ego) and calculate the density in this neighborhood (leaving out the ego). After doing this for all actors, we can characterize the degree of clustering as an average of all the neighborhoods.
C = 1 C = 1/3 C = 0
© Thomas Plotkowiak 2010
Individual local clustering coefficient (in this case for directed ties) Clustering can also be examined for each actor :
– Notice actor 6 has three neighbors and hence only 3 possible ties. Of these only one is present, so actor 6 is not highly clustered.
– Actor eight has 6 neighbors and hence 15 pairs of neighbors and is highly clustered.
2 edges out of 6 edges
© Thomas Plotkowiak 2010
Density for groups
Instead of calculating the density of the whole network (last lecture), we can calculate the density of partitions of the network.
Governmental agencies Non-governmental generalist Welfare specialists
A social structure in which individuals were highly clustered would display a pattern of high densities on the diagonal, and low densities elsewhere.
© Thomas Plotkowiak 2010
Density for groups II
• Group 1 has dense in and out ties to one another and to the other populations
• Group 2 have out-ties among themselves and with group 1 and have high densities of in-ties with all three sub populations
The extend of how those blocks characterize all the individuals within those blocks can be assessed by looking at the standard deviations. The standard deviations measure the lack of homogeneity within the partition, or the extent to which the actors vary.
The density in the 1,1 block is .6667.That is, of the six possible directed ties among actors 1, 3, and 5, four are actually present
© Thomas Plotkowiak 2010
E-I Index
• The E-I (external – internal) index takes the number of ties of group members to outsiders, subtracts the number of ties to other group members, and divides by the total number of ties.
(1-4)/7 = -3/7 (1-2)/7 = -1/7
© Thomas Plotkowiak 2010
E-I Index II
• The resulting E-I index ranges from -1 (all ties internal) to +1 (all ties external). Ties between members of the same group are ignored.
• The E-I index can be applied at three levels: – entire population
– each group
– each individual
Notice: The relative size of sub populations (e.g. 10 vs. 1000) have dramatic consequences for the degree of internal and external contacts, even when individuals may choose contacts at random.
© Thomas Plotkowiak 2010
E-I Index for groups
Notice that the data has been symmetrized
© Thomas Plotkowiak 2010
E-I Index for the entire population
Internal: 7*2/64 = 21% External 25*2/64 = 70% E-I (50-14)/64 = 56%
Notice that the data has been symmetrized
© Thomas Plotkowiak 2010
Permutation Tests
To assess whether the E-I index value is significantly different that what would be expected by random mixing a permutation test is performed.
Notice: Under random distribution, the E-I Index would be expected to have a value of .467 which is not much different from .563, especially given the standard error .078 (given the result the difference of .10 could be just by chance)
© Thomas Plotkowiak 2010
E-I Index for individuals
Notice: Several actors (4,6,9) tend toward closure , while others (10,1) tend toward creating ties outside their groups.
2. Network Measures 2.2 Network Measures for Actors Position & Roles
© Thomas Plotkowiak 2010
Positions & Roles
• Structural Equivalence
• Automorphic Equivalence
• Regular Equivalence
• Measuring similarity/dissimilarity
• Visualizing similarity and distance
• Measuring automorphic equivalence
• Measuring regular equivalence
• Blockmodelling
© Thomas Plotkowiak 2010
Chinese Kinship Relations
© Thomas Plotkowiak 2010
Positions and Roles
• Positions: Actors that show a similar structure of relationships and are thus similarly embedded into the network.
• Roles: The pattern of relationships of members of same or different positions.
• Note: Many of the category systems used by sociologists are based on "attributes" of individual actors that are common across actors.
© Thomas Plotkowiak 2010
Similarity
• The idea of "similarity" has to be rather precisely defined
• Nodes are similar if they fall in the same "equivalence class" – We could come up with a equivalence class of out-degree of zero for
example
• There are three particular definitions of equivalence: – Strucutral Equivalence
– Automorphic Equivalence (rarely used)
– Regular Equivalence
© Thomas Plotkowiak 2010
Strucutral Equivalence
• Structural Equivalence: Two structural equivalent actors could exchange their positions in a network without changing their connections to the other actors in the network.
• Structural equivalence is the "strongest" form of equivalence.
• Problem: Imagine two teachers in Toronto and St. Gallen. Rather than looking for connections to exactly the same persons we would like to find connection to similar persons but not exactly the same ones.
© Thomas Plotkowiak 2010
Automorphic Equivalence
• Automorphic Equivalence: Two persons could change their positions in the network, without changing the structure of the network (Notice that after the exchange they would be partially connected to other persons than before)
• Problem: How big do we have to define the radius in which we analyze the structure of the network (1, 2, 3 … steps)
• For the One-Step Radius we consider the NUMBER of: – asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.
© Thomas Plotkowiak 2010
1 Step, 2 Step Equivalence
1
2
?
© Thomas Plotkowiak 2010
Regular Equivalence
• Regular Equivalence: Two positions are considered as similar, if every important Aspect of the observed structure applies (or does not apply)for both positions.
• For the One-Step Radius we consider the EXISTENCE of : – asymetric outgoing,
– asymetric incoming,
– symetric in- and outgoing,
– and not existing ties.
© Thomas Plotkowiak 2010
A
B C
D E F G H
A
B C
D E F G H
A
B C
D E F G H
1
2 3
B and C are regular equivalent
B and C are automorph equivalent
B and C are structural equivalent
© Thomas Plotkowiak 2010
Computing Positional Similarity Example Information exchange network
© Thomas Plotkowiak 2010
Measuring Similarity Adjacency Matrix
1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West 1 Coun --- 1 0 0 1 0 1 0 1 0
2 Comm 1 --- 1 1 1 0 1 1 1 0 3 Educ 0 1 --- 1 1 1 1 0 0 1 4 Indu 1 1 0 --- 1 0 1 0 0 0 5 Mayr 1 1 1 1 --- 0 1 1 1 1 6 WRO 0 0 1 0 0 --- 1 0 1 0 7 News 0 1 0 1 1 0 --- 0 0 0 8 UWay 1 1 0 1 1 0 1 --- 1 0 9 Welf 0 1 0 0 1 0 1 0 --- 0
10 West 1 1 1 0 1 0 1 0 0 ---
© Thomas Plotkowiak 2010
Measuring Similarity Concatenated Row & Colum View
1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West --- 1 0 1 1 0 0 1 0 1 1 --- 1 1 1 0 1 1 1 1 0 1 --- 0 1 1 0 0 0 1 0 1 1 --- 1 0 1 1 0 0 1 1 1 1 --- 0 1 1 1 1 0 0 1 0 0 --- 0 0 0 0 1 1 1 1 1 1 --- 1 1 1 0 1 0 0 1 0 0 --- 0 0 1 1 0 0 1 1 0 1 --- 0 0 0 1 0 1 0 0 0 0 --- --- 1 0 0 1 0 1 0 1 0 1 --- 1 1 1 0 1 1 1 0 0 1 --- 1 1 1 1 0 0 1 1 1 0 --- 1 0 1 0 0 0 1 1 1 1 --- 0 1 1 1 1 0 0 1 0 0 --- 1 0 1 0 0 1 0 1 1 0 --- 0 0 0 1 1 0 1 1 0 1 --- 1 0 0 1 0 0 1 0 1 0 --- 0 1 1 1 0 1 0 1 0 0 ---
© Thomas Plotkowiak 2010
Pearson correlation coefficients, covariances and cross-products • Person correlation (ranges from -1 to +1) summarize pair-
wise structural equivalence.
© Thomas Plotkowiak 2010
Pairwise Structural Equivalence
We can see, for example, that node 1 and node 9 have identical patterns of ties.
The Pearson correlation measure does not pay attention to the overall prevalence of ties (the mean of the row or column), and it does not pay attention to differences between actors in the variances of their ties.
Often this is desirable to focus only on the pattern, rather than the mean and variance as aspects of similarity between actors.
1
3 5
2 4
7 6
8
9
10
© Thomas Plotkowiak 2010
Euclidean squared distances
Euclidean or squared Euclidean distances are not sensitive to the linearity of association and can be used with valued or binary data.
Other similar measures can be Jaccard or hamming distance.
© Thomas Plotkowiak 2010
Going from pairs to groups of structural equivalence It is often useful to examine the similarities or distances to try to locate groupings of actors (that is, larger than a pair) who are similar. By studying the bigger patterns of which groups of actors are similar to which others, we may also gain some insight into "what about" the actor's positions is most critical in making them more similar or more distant.
In the next two sections we will cover how multi-dimensional scaling and hierarchical cluster analysis can be used to identify patterns in actor-by-actor similarity/distance matrices.
Both of these tools are widely used in non-network analysis; there are large and excellent literatures on the many important complexities of using these methods. Our goal here is just to provide just a very basic introduction.
© Thomas Plotkowiak 2010
Hierarchical Clustering
• Hierarchical Clustering: – Initially places each case in its own cluster
– The two most similar cases are then combined
– This process is repeated until all cases are agglomerated into a single cluster (once a case has been joined it is never re-classsified)
© Thomas Plotkowiak 2010
Multi Dimensional Scaling
• MDS represents the patterns of similarity or dissimilarity in the profiles among the actors as a "map" in a multi-dimensional space. This map lets us see how "close" actors are and whether they "cluster". – Stress is a measure of badness of fit
– The author has to determine the meaning of the dimensions
© Thomas Plotkowiak 2010
Finding automorphic equivalence (for binary data) • Brute Force Approach: All the nodes of a graph are
exchanged and the distances among all pairs of actors in the new graph are compared to the original one. When the new and the old graph have the same distances among nodes the "swapping" that was done identified the automorphic position.
• Brute Force is expensive (363880 Permutations!!)
© Thomas Plotkowiak 2010
Regular Equivalence Block Matrix Informal Definition: Two actors are regularly equivalent if they
have similar patterns of ties to equivalent others.
Problem: Each definition of each position depends on its relations with other positions. Where to start?
Sender
Repeater
Receiver
© Thomas Plotkowiak 2010
Regular Equivalence Block Matrix Block Image • Create a matrix so that each actor in each partition has the
same pattern of connection to actors in the other partition. – Notice: We don’t care about ties among members of the same regular
class!
– A sends to {BCD} but none of {EFGHI}
– {BCD} does not send to A but to {EFGHI}
– {EFGHI} does not send to A or {BCD}
A B,C,D E,F,G,H,I A --- 1 0
B,C,D 0 --- 1 E,F,G,H,I 0 0 ---
A B C D E F G H I A --- 1 1 1 0 0 0 0 0 B 0 --- 0 0 1 1 0 0 0 C 0 0 --- 0 0 0 1 0 0 D 0 0 0 --- 0 0 0 1 1 E 0 0 0 0 --- 0 0 0 0 F 0 0 0 0 0 --- 0 0 0 G 0 0 0 0 0 0 --- 0 0 H 0 0 0 0 0 0 0 --- 0 I 0 0 0 0 0 0 0 0 ---
© Thomas Plotkowiak 2010
Algorithms for detection of Regular Equivalence Tabu Search • This method of blocking and relies on extensive use of the
computer. Tabu search is trying to implement the same idea of grouping together actors who are most similar into a block.
• Tabu search does this by searching for sets of actors who, if placed into a blocks, produce the smallest sum of within-block variances in the tie profiles.
• If actors in a block have similar ties, their variance around the block mean profile will be small.
• So, the partitioning that minimizes the sum of within block variances is minimizing the overall variance in tie profiles
© Thomas Plotkowiak 2010
Algorithms for detection of Regular Equivalence Tabu Search Results
1
3 5
2 4
7 6
8
9
10
(2,5) for example, are pure "repeaters"
The set { 6, 10, 3 } send to only two other types (not all three other types) and receive from only one other type.
© Thomas Plotkowiak 2010
Blockmodeling
Blockmodeling is able to include all kinds of equivalences into one analysis
Examples of blocks:
• Complete blocks (everybody is connected with each other inside the block)
• Null blocks (people in this block are not connected to anybody)
• Regular blocks, people share the same regular equivalence class in this block
© Thomas Plotkowiak 2010
Blockmodels Matrix Permutation
© Thomas Plotkowiak 2010
Blockmodels
Student Government. Discussion relation among the eleven students who were members of the student government at the University of Ljubljana in Sloveninia. The students were asked to indicate with whom of their fellows they discussed matters concerning the administration of the university informally.
© Thomas Plotkowiak 2010
General Blockmodelling with predefined partitions
© Thomas Plotkowiak 2010
Blockmodeling based on actors-attributes
© Thomas Plotkowiak 2010
Blockmodels Matrix Representation
© Thomas Plotkowiak 2010
Blockmodels Matrix Permutation
2. Network Measures 2.2 Network Measures Subgroups Cohesive Subgroups
© Thomas Plotkowiak 2010
Cohesive Subgroups
Cohesive subgroups: We hypothesize that cohesive subgroups are the basis for solidarity, shared norms, identity and collective behavior. Perceived similarity, for instance, membership of a social group, is expected to promote interaction. We expect similar people to interact a lot, at least more often than with dissimilar people.
© Thomas Plotkowiak 2010
Example – Families in Haciendas (1948)
Each arc represents "frequent visits" from one family to another.
© Thomas Plotkowiak 2010
Components
A semiwalk from vertex u to vertex v is a sequence of lines such that the end vertex of one line is the starting vertex of the next line and the sequence starts at vertex u and end at vertex v.
A walk is a semiwalk with the additional condition that none of its lines are an arc of which the end vertex is the arc's tail
Note that v5 v3 v4 v5 v3 is also a walk to v3
© Thomas Plotkowiak 2010
Paths
A semipath is a semiwalk in which no vertex in between the first and last vertex of the semiwalk occurs more than once.
A path is a walk in which no vertex in between the first and last vertex of the walk occurs more than once.
© Thomas Plotkowiak 2010
Connectedness
A network is (weakly) connected if each pair of vertices is connected by a semipath.
A network is strongly connected if each pair of vertices is connected by a path.
This network is not connected because v2 is isolated.
© Thomas Plotkowiak 2010
Connected Components
A (weak) component is a maximal (weakly) connected subnetwork.
A strong component is a maximal strongly connected subnetwork.
v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component
© Thomas Plotkowiak 2010
Example Strong Components
1. Net > Components > {Strong, Weak}
© Thomas Plotkowiak 2010
Cliques and Complete Subnetworks
A clique is a maximal complete subnetwork containing three vertices or more. (cliques can overlap)
v1,v6,v5 is a clique
v2,v4,v5 is not a clique
v2,v3,v4,v5 is a clique
© Thomas Plotkowiak 2010
n-Clique & n-Clan
2-Clique
2-Clan
n-Clique: Is a maximal complete subgraph, in the analyzed graph, each node has maximally the distance n. A Clique is a n-Clique with n=1.
n-Clan: Ist a maximal complete subgraph, where each node has maximally the distance n in the resulting graph
© Thomas Plotkowiak 2010
n-Clans & n-Cliques
2-Clans: 123,234,345,456,561,612
2-Cliques: 123,234,345,456,561,612 and 135,246
1
2
5 6
4
3
© Thomas Plotkowiak 2010
k-Plexes
2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123
In general k-Plexes are more robust than Cliques und Clans.
1
2
5 6
4
3
k-Plex: A k-Plex is a maximal complete subgraph with gs nodes, in which each node has at least connections with gs-k nodes.
© Thomas Plotkowiak 2010
Overview Subgroups
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
2 Components 1 Component 2 2-Clans (341,412) 2 2-Cliques (341,412)
1 Component 1 2-Clans (124) 1 2-Clique (124)
1 Component 1 2-Clan (1234) 1 2-Clique (1234) 1 2-Plex (1234)
1 Component 1 2-Clan (1234) 1 2-Clique (1234) 1 2-Plex (1234) 1 Clique
© Thomas Plotkowiak 2010
Overview Groupconcepts
• 1-Clique, 1-Clan und 1-Plex are identical
• A n-Clan is always included in a higher order n-Clique
Component
2-Clique
2-Clan
2-Plex
Clique
© Thomas Plotkowiak 2010
k-Cores
• Net > Components > {Strong, Weak} A k-core is a maximal subnetwork in which each vertex has at least degree k within the subnetwork.
© Thomas Plotkowiak 2010
k-Cores
k-cores are nested which means that a vertex in a 3-core is also part of a 2-core but not all members of a 2-core belong to a 3-core.
© Thomas Plotkowiak 2010
k-Cores Application
• K-cores help to detect cohesive subgroups by removing the lowes k-cores from the network until the network breaks up into relatively dense components.
• Net > Partitions > Core >{Input, Output, All}
© Thomas Plotkowiak 2010
3. Network Mechanisms
© Thomas Plotkowiak 2010
Network Mechanisms
• Tie Outdegree Effect
• Reciprocity
• Transitivity
& Three-Cycles Effect
• Balance Effect
• In/Out Popularity Effect
• In/Out Activity Effect
• In/Out Assortativity Effect
• Covariate Similarity Effect
• Covariate Ego-Effect
• Covariate Alter-Effect
• Same Covariate Effect
© Thomas Plotkowiak 2010
Outdegree Effect
• The most basic effect is defined by the outdegree of actor i. It represents the basic tendency to have ties at all,
• In a decision-theoretic approach this effect can be regarded as the balance of benefits and costs of an arbitrary tie. • Most networks are sparse (i.e., they have a density well below 0.5)
which can be represented by saying that for a tie to an arbitrary other actor – arbitrary meaning here that the other actor has no characteristics or tie pattern making him/her especially attractive to i –, the costs will usually outweigh the benefits. Indeed, in most cases a negative parameter is obtained for the outdegree effect.
© Thomas Plotkowiak 2010
Reciprocity Effect
• Another quite basic effect is the tendency toward reciprocity, represented by the number of reciprocated ties of actor i. This is a basic feature of most social networks (cf. Wasserman and Faust, 1994, Chapter 13)
i j
© Thomas Plotkowiak 2010
Transitivity and other triadic effects
• Next to reciprocity, an essential feature in most social networks is the tendency toward transitivity, or transitive closure (sometimes called clustering): friends of friends become friends, or in graph-theoretic terminology: two-paths tend to be, or to become, closed (e.g., Davis 1970, Holland and Leinhardt 1971).
Transitive triplet Three cycle
i
j
h
i
j
h
© Thomas Plotkowiak 2010
Balance Effect
• An effect closely related to transitivity is balance (Newcomb, 1962), which is the same as structural equivalence with respect to out-ties (Burt, 1982), is the tendency to have and create ties to other actors who make the same choices as ego.
A D
B C
© Thomas Plotkowiak 2010
In/Out Popularity Effect • The degree-related popularity effect is based on indegree or
outdegree of an actor. Nodes with higher indegree, or higher outdegree, are more attractive for others to send a tie to.
• That implies that high indegrees reinforce themselves, which will lead to a relatively high dispersion of the indegrees (a Matthew effect in popularity as measured by indegrees, cf. Merton, 1968 and Price, 1976).
A
D B C
© Thomas Plotkowiak 2010
In/Out Activity Effect • Nodes with higher indegree, or higher outdegree respectively,
will have an extra propensity to form ties to others.
• The outdegree-related activity effect again is a self-reinforcing effect: when it has a positive parameter, the dispersion of outdegrees will tend to increase over time, or to be sustained if it already is high.
A
D B C
© Thomas Plotkowiak 2010
Preferential Attachment
• Notice: These four degree-related effects can be regarded as the analogues in the case of directed relations of what was called cumulative advantage by Price (1976) and preferential attachment by Barabasi and Albert (1999) in their models for dynamics of non-directed networks: a self-reinforcing process of degree differentiation.
© Thomas Plotkowiak 2010
In/Out Assortativity Effect • Preferences of actors dependent on their degrees. Depending
on their own out- and in-degrees, actors can have differential preferences for ties to others with also high or low out- and in-degrees (Morris and Kretzschmar 1995; Newman 2002)
E B C
A D
F
© Thomas Plotkowiak 2010
Covariate Similarity Effect • The covariate similarity effect, describes whether ties tend to
occur more often between actors with similar values on a value (homophily effect). Tendencies to homophily constitute a fundamental characteristic of many social relations, see McPherson, Smith-Lovin, and Cook (2001).
• Example: Ipad Owners tend to be friends with other Ipad owners.
© Thomas Plotkowiak 2010
Covariate Ego Effect
• The covariate ego effect, describes that actors with higher values on a covariate tend to nominate more friends and hence have a higher outdegree.
• Example: Heavier smokers have more friends.
© Thomas Plotkowiak 2010
Covariate Alter Effect
• The alter effect describes whether actors with higher V values will tend to be nominated by more others and hence have higher indegrees.
• Example: Beautiful people have more friends.
© Thomas Plotkowiak 2010
Modeling networks
1. Actor Based modeling for longitudonal data – SIENA (analysis of repeated measures on social networks and MCMC-
estimation of exponential random graphs)
2. Stochastic modeling for panels – Pnet
objective function Model 1 Model 2 Model3
estim s.e. p estim s.e p estim
s.e. p
outdegree (density) -2,46 0,12 <0,0001* -4,04 0,23 <0,0001* -1,99 0,13 <0,0001*
reciprocity 2,57 0,20 <0,0001* 2,29 0,22 <0,0001* 3,02 0,21 <0,0001*
transitive triplets 0,07 0,01 <0,0001*
transitive mediated triplets -0,03 0,01 0,0005*
transitive ties 1,47 0,24 <0,0001*
3-cycles -0,06 0,02 0,0037*
attribute party 1,13 0,15 <0,0001* 0,73 0,15 <0,0001*
attribute gender -0,11 0,15 0.48
3. Network Theories
Homophily & Assortativity Power Laws & Preferential Attachment The Strength of Weak Ties Small Worlds Social Capital
3.1Homophily
© Thomas Plotkowiak 2010
Homophily
• Homophily (i.e., love of the same) is the tendency of individuals to associate and bond with similar others. (Mechanisms of selection vs influence)
• In the study of networks, assortative mixing is a bias in favor of connections between network nodes with similar characteristics. In the specific case of social networks, assortative mixing is also known as homophily. The rarer disassortative mixing is a bias in favor of connections between dissimilar nodes.
Low Homophily High Homophily
© Thomas Plotkowiak 2010
Homophily II
Types (acc. to McPherson et. Al 2001): – Race and Ethnicity (Marsden 1987, 88| Louch 2000, Kalleberg et al
1996, Laumann 1973…)
– Sex and Gender (Maccoby 1998, Eder & Hallinan 1978, Shrum et al 1988, Huckfeldt & Sprague 1995, Brass 1985 …)
– Age (Fischer 1977,82, Feld 1982, Blau et Al 1991, Burt 1990,91…)
– Religion (Laumann 1973, Verbrugge 1977, Fischer 1977,82, Marsden 1988, Louch 2000…)
– Education, Occupation and Social Class (Laumann 1973, Marsden 1987, Verbrugge 1977, Wright 1997, Kalmijn 1998…)
– Network Positions (Brass 1985, Burt 1982, Friedkin 1993…)
– Behavior (Cohen 1977, Kandel 1978, Knocke 1990…)
– Attitudes, Abilities, Beliefs and Aspirations (Jussim & Osgood 1989, Huckfeldt & Sprague 1995, Verbrugge 1977,83, Knocke 1990)
© Thomas Plotkowiak 2010
Schellings Segregation Demo
3.2 Power Laws & Preferential Attachment
© Thomas Plotkowiak 2010
Power Law distribution
• As a function of k, what fraction of pages on the Web have k in-links?
• A natural guess the normal, or Gaussian, distribution
• Central Limit Theorem (roughly): if we take any sequence of small independent random quantities, then in the limit their sum will be distributed according to the normal distribution
© Thomas Plotkowiak 2010
Power Law distribution
But when people measured the Web, they found something
very different: The fraction of Web pages that have k in-links is approximately
proportional to 1/k^2
• Power law function
• Popularity exhibits extreme imbalances: there are few very popular Web pages that have extremely many in-links
True for other domains:
• the fraction of telephone numbers that receive k calls per day: 1/k^2
• the fraction of books bought by k people: 1/k^3
• the fraction of scientific papers that receive k citations: 1/k^3
© Thomas Plotkowiak 2010
Preferential attachment leads to power laws • A preferential attachment process is any of a class of
processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. Notice: "Preferential attachment" (A.L. Barabasi and R.Albert 1999) is only the most recent of many names that have been given to such processes.
• Notice: Preferential attachment can, under suitable circumstances, generate power law distributions.
© Thomas Plotkowiak 2010
Preferential Attachment Demo
DEMO with NETLOGO
3.3 Balance Theory
© Thomas Plotkowiak 2010
Balance Theory Franz Heider Franz Heider (1940): A person (P) feels uncomfortable whe he ore she disagrees with his ore her friend(O) on a topic (X).
P feels an urge to change this imbalance. He can adjust his opinion, change his affection for O, or convince himself that O is not really opposed to X.
© Thomas Plotkowiak 2010
Balance Theory
(a) + + + : three people are mutual friends
(c) - - + : two people are friends, and they have mutual enemy in the third
(b) + + - : A is a friend with B and C; but B and C – enemies
(d) - - - : all enemies; motivates two of them to “team up” against the third
b and d represent unstable relationship
© Thomas Plotkowiak 2010
Balance Theory Community in a New England Monastery
Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)
© Thomas Plotkowiak 2010
Balance Theory International Relations
3.4 Strength of weak ties
© Thomas Plotkowiak 2010
Strength of Weak Ties Mark Granovetter • “One of the most influential sociology papers ever
written” (Barabasi) – One of the most cited (Current Contents, 1986)
• Accepted by the American Journal of Sociology after 4 years of unsuccessful attempts elsewhere.
• Interviewed people and asked: “How did you find your job?” – Kept getting the same answer: “through an acquaintance,
not a friend”
© Thomas Plotkowiak 2010
Basic Argument
• Classify interpersonal relations as “strong”, “weak”, or “absent”
• Strength is (vaguely) defined as “a (probably linear) combination of… – the amount of time,
– the emotional intensity,
– the intimacy (mutual confiding),
– and the reciprocal services which characterize the tie
• The stronger the tie between two individuals, the larger the proportion of people to which they are both tied (weakly or strongly)
© Thomas Plotkowiak 2010
Strong Ties
• If person A has a strong tie to both B and C, then it is unlikely for B and C not to share a tie.
B C
A
© Thomas Plotkowiak 2010
Weak Ties for Information Diffusion
„Intuitively speaking, this means that whatever is to be diffused can reach a larger number of people, and traverse greater social distance, when passed through weak ties rather than strong.“
3.4 Small World Phenomenon
© Thomas Plotkowiak 2010
Connectivity and the Small World
1. Travers and Milgram’s work on the small world is responsible for the standard belief that “everyone is connected by a chain of about 6 steps.”
2. Two questions: – Given what we know about networks, what is the longest path (defined
by handshakes) that separates any two people?
– Is 6 steps a long distance or a short distance?
© Thomas Plotkowiak 2010
Example: Two Hermits on opposite sites of the country
OH Hermit
Mt. Hermit
Store Owner
Store Owner
Truck Driver
Truck Driver
Manager
Manager
Corporate Manager
Corporate Manager
Corporate President
Corporate President
Congress Rep.
Congress Rep.
© Thomas Plotkowiak 2010
Milgrams Test
Milgram’s test: Send a packet from sets of randomly selected people to a stockbroker in Boston.
Experimental Setup: Arbitrarily select people from 3 pools: – People in Boston
– Random in Nebraska
– Stockholders in Nebraska
© Thomas Plotkowiak 2010
Results
• Most chains found their way through a small number of intermediaries.
• What do these two findings tell us of the global structure of social relations?
© Thomas Plotkowiak 2010
Results II
1. Social networks contains a lot of short paths
2. People acting without any sort of global ‘map’ are effective at collectively finding these short paths
© Thomas Plotkowiak 2010
The Watts-Strogatz model
• Two main principles explaining short paths: homophily and weak ties: • Homophily: every node forms a link to all other nodes that lie within a
radius of r grid steps
• Weak ties: each nodes forms a link to k other random nodes
• Suppose, everyone lives on a two-dimensional grid (as a model of geographic proximity)
© Thomas Plotkowiak 2010
Watts-Strogatz
© Thomas Plotkowiak 2010
The Watts-Strogatz model
• Suppose, we only allow one out of k nodes to a to have a single random friend
• k * k square has k random links - consider it as a single node
• Surprising small amount of randomness is enough to make the world “small” with short paths between every pair of nodes
© Thomas Plotkowiak 2010
Decentralized Search
• People are able to collectively find short paths to the designated target while they don’t know the global ‘map’ of all connections
• Breadth-first search vs. tunneling
• Modeling: – Can we construct a network where decentralized search succeeds?
– If yes, what are the qualitative properties of such a network?
© Thomas Plotkowiak 2010
A model for decentralized search
• A starting node s is given a message that it must forward to a target node t
• s knows only the location of t on the grid, but s doesn’t know the edges out of any other node
• Model must span all the intermediate ranges of scale as well
© Thomas Plotkowiak 2010
Modeling the process of decentralized search
• We adapt the model by introducing clustering exponent q • For two nodes v and w, d(v,w) - the number of steps between them
• Random edges now generated with probability proportional to d(v,w)-q
• Model changes with different values q: – q=0 : links are chosen uniformly at random
– when q is very small : long-range links are “too random”
– when q is large: long-range links are “not random enough”
© Thomas Plotkowiak 2010
Varying clustering exponent
© Thomas Plotkowiak 2010
Decentralized Search when q=2
Experiments show that decentralized search is more efficient when q=2 (random links follow inverse-square distribution)
© Thomas Plotkowiak 2010
What’s special about q=2
• Since area in the plane grows like the square of the radius, the total number of nodes in this group is proportional to d2
• the probability that a random edge links into some node in this ring is approximately independent of the value of d.
• long-range weak ties are being formed in a way that’s spread roughly uniformly over all different scales of resolution
Think of the postal system: country, state, city, street, and finally the street number
© Thomas Plotkowiak 2010
Small-World Phenomenon Conclusions I 1. Start from a Milgram’s experiment: (1) seems there are short
paths and (2) people know how to find them effectively
2. Build mathematical models for (1) and (2)
3. Make a prediction based on the models: clustering exponent q=2
4. Validate this prediction using real data from large social networks (LiveJournal, Facebook)
Why do social networks arrange themselves in a pattern of friendships across distance that is close to optimal for forwarding messages to far-off targets?
© Thomas Plotkowiak 2010
Small-World Phenomenon Conclusions II • If there are dynamic forces or selective pressures driving the
network toward this shape, they must be more implicit, and it remains a fascinating open problem to determine whether such forces exist and how they might operate.
• Robustness, Search, Spread of disease, opinion formation, spread of computer viruses, gossip,…
• For example: Diseases move more slowly in highly clustered graphs
• The dynamics are very non-linear -- with no clear pattern based on local connectivity.
Implication: small local changes (shortcuts) can have dramatic global outcomes (think of disease diffusion)
© Thomas Plotkowiak 2010
Small World Construction
• Network changes from structured to random
• Given 6 Billion Nodes L starts at 3 million, decreases to 4 (!)
• Clustering: starts at 0.75, decreases to zero
• Most important is what happens ALONG the way.
© Thomas Plotkowiak 2010
Small worlds demo
© Thomas Plotkowiak 2010
Interactive Summary The biggest advantage I can gain by using SNA is… The most important fact about SNA for me is… The concept that made the most sense for me today was… The biggest danger in using SNA is … If I will use SNA in the future, I will try to make sure that… If I use SNA in my next project I will use it for … I should change my perspective on networks in considering … I have changed my opinion about SNA , finding out that… I missed today that … Before attending that seminar I didn't know that … I wish we could have covered… If I forget mostly everything that learned today, I will still remember … The most important thing today for me was …
Thanks for your attention!
Questions & Discussion