Download - Social network analysis part ii

Social Network Analysis

Fundamental Concepts in Social Network Analysis (Part 2)

Katarina Stanoevska-Slabeva, Miriam Meckel, Thomas Plotkowiak

© Thomas Plotkowiak 2010

Agenda

1. Intro

2. Measuring Networks – Embedding Measures (Ties)

– Positions and Roles (Nodes)

– Group Concepts

3. Network Mechanisms

4. Network Theories


Introduction Knoke information exchange network

In 1978, Knoke & Wood collected data from workers at 95 organizations in Indianapolis. Respondents indicated with which other organizations their own organization had any of 13 different types of relationships.

The exchange of information among ten organizations that were involved in the local political economy of social welfare services in a Midwestern city.

2. Network Measures 2.1 Network Measures for Actors Embedding Measures


Embedding Measures

• Reciprocity (Dyad Census)

• Transitivity (Triad Census)

• Clustering

• Density

• Group-external and group-internal Ties

• Other Network Mechanisms


Reciprocity

• With symmetric data two actors are either connected or not.

• With directed data there are four possible dyadic relationships: – A and B are not connected

– A sends to B

– B sends to A

– A and B send to each other.


Reciprocity II

• What is the reciprocity in this network? – Answer 1: % of pairs that have reciprocated ties / all possible pairs

• AB of {AB,AC,BC} = 0.33

– Answer2: % of pairs that have reciprocated ties / existing pairs • AB of {AB,BC} = 0.5

– Answer 3: % directed ties / all directed ties • {AB,BA} of {AB, BA, AC, CA, BC, CA} = 0.33


Transitivity

• With undirected there are four possible types of triadic relations – No ties

– One tie

– Two Ties

– Three Ties

• The count of the relative prevalence of these four types of relations is called "triad census“. A population can be characterized by: – "isolation"

– "couples only"

– "structural holes" (one actors is connected to two others, who are not connected to each other)

– or "clusters"


Transitivity II Directed Networks

M-A-N number: M # of mutual positive dyads A #asymmetric dyads N #of null dyads

D =Down, U = Up, C = Cyclic, T= Transitive


Triad Census Models

Balance Model with Two Cliques (Heider Balance)

Triads either 300 or 102

Linear Hierarchy Model Every triad is 030T

Ranked Clusters Model (Hierarchy of Cliques) Triads: 300, 102, 003, 120D, 120U, 030T, 021D, 021U

(all) (all)

(all) (all)

(all)


Example Directed information exchange network

The exchange of information among ten organizations that were involved in the local political economy of social welfare services in a Midwestern city.

1

3 5

2 4

7 6

8

9

10


Transitivity III

• How to measure transitivity? – A) Divide the number of found transitive triads by the total number of

possible triplets (for 3 nodes there are 6 possibilities)

– B) Norm the number of transitive triads by the number of cases where a single link could complete the triad. Norm {AB, BC, AC} by {AB, BC, anything) (for 3 nodes there are 4 possibilities)

A

B C

1

2

3


Transitivity IV

146/720

146/217


Clustering

Most actors live in local neighborhoods and are connected to one another. A large proportion of the total number of ties is highly "clustered" into local neighborhoods.

VS.


Global clustering coefficient

Closed triplet Triplet


Average Local Clustering coefficient

A measure to calculate how clustered the graph is we examine the local neighborhood of an actor (all actors who are directly connected to ego) and calculate the density in this neighborhood (leaving out the ego). After doing this for all actors, we can characterize the degree of clustering as an average of all the neighborhoods.

C = 1 C = 1/3 C = 0


Individual local clustering coefficient (in this case for directed ties) Clustering can also be examined for each actor :

– Notice actor 6 has three neighbors and hence only 3 possible ties. Of these only one is present, so actor 6 is not highly clustered.

– Actor eight has 6 neighbors and hence 15 pairs of neighbors and is highly clustered.

2 edges out of 6 edges


Density for groups

Instead of calculating the density of the whole network (last lecture), we can calculate the density of partitions of the network.

Governmental agencies Non-governmental generalist Welfare specialists

A social structure in which individuals were highly clustered would display a pattern of high densities on the diagonal, and low densities elsewhere.


Density for groups II

• Group 1 has dense in and out ties to one another and to the other populations

• Group 2 have out-ties among themselves and with group 1 and have high densities of in-ties with all three sub populations

The extend of how those blocks characterize all the individuals within those blocks can be assessed by looking at the standard deviations. The standard deviations measure the lack of homogeneity within the partition, or the extent to which the actors vary.

The density in the 1,1 block is .6667.That is, of the six possible directed ties among actors 1, 3, and 5, four are actually present


E-I Index

• The E-I (external – internal) index takes the number of ties of group members to outsiders, subtracts the number of ties to other group members, and divides by the total number of ties.

(1-4)/7 = -3/7 (1-2)/7 = -1/7


E-I Index II

• The resulting E-I index ranges from -1 (all ties internal) to +1 (all ties external). Ties between members of the same group are ignored.

• The E-I index can be applied at three levels: – entire population

– each group

– each individual

Notice: The relative size of sub populations (e.g. 10 vs. 1000) have dramatic consequences for the degree of internal and external contacts, even when individuals may choose contacts at random.


E-I Index for groups

Notice that the data has been symmetrized


E-I Index for the entire population

Internal: 7*2/64 = 21% External 25*2/64 = 70% E-I (50-14)/64 = 56%

Notice that the data has been symmetrized


Permutation Tests

To assess whether the E-I index value is significantly different that what would be expected by random mixing a permutation test is performed.

Notice: Under random distribution, the E-I Index would be expected to have a value of .467 which is not much different from .563, especially given the standard error .078 (given the result the difference of .10 could be just by chance)


E-I Index for individuals

Notice: Several actors (4,6,9) tend toward closure , while others (10,1) tend toward creating ties outside their groups.

2. Network Measures 2.2 Network Measures for Actors Position & Roles


Positions & Roles

• Structural Equivalence

• Automorphic Equivalence

• Regular Equivalence

• Measuring similarity/dissimilarity

• Visualizing similarity and distance

• Measuring automorphic equivalence

• Measuring regular equivalence

• Blockmodelling


Chinese Kinship Relations


Positions and Roles

• Positions: Actors that show a similar structure of relationships and are thus similarly embedded into the network.

• Roles: The pattern of relationships of members of same or different positions.

• Note: Many of the category systems used by sociologists are based on "attributes" of individual actors that are common across actors.


Similarity

• The idea of "similarity" has to be rather precisely defined

• Nodes are similar if they fall in the same "equivalence class" – We could come up with a equivalence class of out-degree of zero for

example

• There are three particular definitions of equivalence: – Strucutral Equivalence

– Automorphic Equivalence (rarely used)

– Regular Equivalence


Strucutral Equivalence

• Structural Equivalence: Two structural equivalent actors could exchange their positions in a network without changing their connections to the other actors in the network.

• Structural equivalence is the "strongest" form of equivalence.

• Problem: Imagine two teachers in Toronto and St. Gallen. Rather than looking for connections to exactly the same persons we would like to find connection to similar persons but not exactly the same ones.


Automorphic Equivalence

• Automorphic Equivalence: Two persons could change their positions in the network, without changing the structure of the network (Notice that after the exchange they would be partially connected to other persons than before)

• Problem: How big do we have to define the radius in which we analyze the structure of the network (1, 2, 3 … steps)

• For the One-Step Radius we consider the NUMBER of: – asymetric outgoing,

– asymetric incoming,

– symetric in- and outgoing,

– and not existing ties.


1 Step, 2 Step Equivalence

1

2

?


Regular Equivalence

• Regular Equivalence: Two positions are considered as similar, if every important Aspect of the observed structure applies (or does not apply)for both positions.

• For the One-Step Radius we consider the EXISTENCE of : – asymetric outgoing,

– asymetric incoming,

– symetric in- and outgoing,

– and not existing ties.


A

B C

D E F G H

A

B C

D E F G H

A

B C

D E F G H

1

2 3

B and C are regular equivalent

B and C are automorph equivalent

B and C are structural equivalent


Computing Positional Similarity Example Information exchange network


Measuring Similarity Adjacency Matrix

1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West 1 Coun --- 1 0 0 1 0 1 0 1 0

2 Comm 1 --- 1 1 1 0 1 1 1 0 3 Educ 0 1 --- 1 1 1 1 0 0 1 4 Indu 1 1 0 --- 1 0 1 0 0 0 5 Mayr 1 1 1 1 --- 0 1 1 1 1 6 WRO 0 0 1 0 0 --- 1 0 1 0 7 News 0 1 0 1 1 0 --- 0 0 0 8 UWay 1 1 0 1 1 0 1 --- 1 0 9 Welf 0 1 0 0 1 0 1 0 --- 0

10 West 1 1 1 0 1 0 1 0 0 ---


Measuring Similarity Concatenated Row & Colum View

1 Coun 2 Comm 3 Educ 4 Indu 5 Mayr 6 WRO 7 News 8 UWay 9 Welf 10 West --- 1 0 1 1 0 0 1 0 1 1 --- 1 1 1 0 1 1 1 1 0 1 --- 0 1 1 0 0 0 1 0 1 1 --- 1 0 1 1 0 0 1 1 1 1 --- 0 1 1 1 1 0 0 1 0 0 --- 0 0 0 0 1 1 1 1 1 1 --- 1 1 1 0 1 0 0 1 0 0 --- 0 0 1 1 0 0 1 1 0 1 --- 0 0 0 1 0 1 0 0 0 0 --- --- 1 0 0 1 0 1 0 1 0 1 --- 1 1 1 0 1 1 1 0 0 1 --- 1 1 1 1 0 0 1 1 1 0 --- 1 0 1 0 0 0 1 1 1 1 --- 0 1 1 1 1 0 0 1 0 0 --- 1 0 1 0 0 1 0 1 1 0 --- 0 0 0 1 1 0 1 1 0 1 --- 1 0 0 1 0 0 1 0 1 0 --- 0 1 1 1 0 1 0 1 0 0 ---


Pearson correlation coefficients, covariances and cross-products • Person correlation (ranges from -1 to +1) summarize pair-

wise structural equivalence.


Pairwise Structural Equivalence

We can see, for example, that node 1 and node 9 have identical patterns of ties.

The Pearson correlation measure does not pay attention to the overall prevalence of ties (the mean of the row or column), and it does not pay attention to differences between actors in the variances of their ties.

Often this is desirable to focus only on the pattern, rather than the mean and variance as aspects of similarity between actors.

1

3 5

2 4

7 6

8

9

10


Euclidean squared distances

Euclidean or squared Euclidean distances are not sensitive to the linearity of association and can be used with valued or binary data.

Other similar measures can be Jaccard or hamming distance.


Going from pairs to groups of structural equivalence It is often useful to examine the similarities or distances to try to locate groupings of actors (that is, larger than a pair) who are similar. By studying the bigger patterns of which groups of actors are similar to which others, we may also gain some insight into "what about" the actor's positions is most critical in making them more similar or more distant.

In the next two sections we will cover how multi-dimensional scaling and hierarchical cluster analysis can be used to identify patterns in actor-by-actor similarity/distance matrices.

Both of these tools are widely used in non-network analysis; there are large and excellent literatures on the many important complexities of using these methods. Our goal here is just to provide just a very basic introduction.


Hierarchical Clustering

• Hierarchical Clustering: – Initially places each case in its own cluster

– The two most similar cases are then combined

– This process is repeated until all cases are agglomerated into a single cluster (once a case has been joined it is never re-classsified)


Multi Dimensional Scaling

• MDS represents the patterns of similarity or dissimilarity in the profiles among the actors as a "map" in a multi-dimensional space. This map lets us see how "close" actors are and whether they "cluster". – Stress is a measure of badness of fit

– The author has to determine the meaning of the dimensions


Finding automorphic equivalence (for binary data) • Brute Force Approach: All the nodes of a graph are

exchanged and the distances among all pairs of actors in the new graph are compared to the original one. When the new and the old graph have the same distances among nodes the "swapping" that was done identified the automorphic position.

• Brute Force is expensive (363880 Permutations!!)


Regular Equivalence Block Matrix Informal Definition: Two actors are regularly equivalent if they

have similar patterns of ties to equivalent others.

Problem: Each definition of each position depends on its relations with other positions. Where to start?

Sender

Repeater

Receiver


Regular Equivalence Block Matrix Block Image • Create a matrix so that each actor in each partition has the

same pattern of connection to actors in the other partition. – Notice: We don’t care about ties among members of the same regular

class!

– A sends to {BCD} but none of {EFGHI}

– {BCD} does not send to A but to {EFGHI}

– {EFGHI} does not send to A or {BCD}

A B,C,D E,F,G,H,I A --- 1 0

B,C,D 0 --- 1 E,F,G,H,I 0 0 ---

A B C D E F G H I A --- 1 1 1 0 0 0 0 0 B 0 --- 0 0 1 1 0 0 0 C 0 0 --- 0 0 0 1 0 0 D 0 0 0 --- 0 0 0 1 1 E 0 0 0 0 --- 0 0 0 0 F 0 0 0 0 0 --- 0 0 0 G 0 0 0 0 0 0 --- 0 0 H 0 0 0 0 0 0 0 --- 0 I 0 0 0 0 0 0 0 0 ---


Algorithms for detection of Regular Equivalence Tabu Search • This method of blocking and relies on extensive use of the

computer. Tabu search is trying to implement the same idea of grouping together actors who are most similar into a block.

• Tabu search does this by searching for sets of actors who, if placed into a blocks, produce the smallest sum of within-block variances in the tie profiles.

• If actors in a block have similar ties, their variance around the block mean profile will be small.

• So, the partitioning that minimizes the sum of within block variances is minimizing the overall variance in tie profiles


Algorithms for detection of Regular Equivalence Tabu Search Results

1

3 5

2 4

7 6

8

9

10

(2,5) for example, are pure "repeaters"

The set { 6, 10, 3 } send to only two other types (not all three other types) and receive from only one other type.


Blockmodeling

Blockmodeling is able to include all kinds of equivalences into one analysis

Examples of blocks:

• Complete blocks (everybody is connected with each other inside the block)

• Null blocks (people in this block are not connected to anybody)

• Regular blocks, people share the same regular equivalence class in this block


Blockmodels Matrix Permutation


Blockmodels

Student Government. Discussion relation among the eleven students who were members of the student government at the University of Ljubljana in Sloveninia. The students were asked to indicate with whom of their fellows they discussed matters concerning the administration of the university informally.


General Blockmodelling with predefined partitions


Blockmodeling based on actors-attributes


Blockmodels Matrix Representation


Blockmodels Matrix Permutation

2. Network Measures 2.2 Network Measures Subgroups Cohesive Subgroups


Cohesive Subgroups

Cohesive subgroups: We hypothesize that cohesive subgroups are the basis for solidarity, shared norms, identity and collective behavior. Perceived similarity, for instance, membership of a social group, is expected to promote interaction. We expect similar people to interact a lot, at least more often than with dissimilar people.


Example – Families in Haciendas (1948)

Each arc represents "frequent visits" from one family to another.


Components

A semiwalk from vertex u to vertex v is a sequence of lines such that the end vertex of one line is the starting vertex of the next line and the sequence starts at vertex u and end at vertex v.

A walk is a semiwalk with the additional condition that none of its lines are an arc of which the end vertex is the arc's tail

Note that v5 v3 v4 v5 v3 is also a walk to v3


Paths

A semipath is a semiwalk in which no vertex in between the first and last vertex of the semiwalk occurs more than once.

A path is a walk in which no vertex in between the first and last vertex of the walk occurs more than once.


Connectedness

A network is (weakly) connected if each pair of vertices is connected by a semipath.

A network is strongly connected if each pair of vertices is connected by a path.

This network is not connected because v2 is isolated.


Connected Components

A (weak) component is a maximal (weakly) connected subnetwork.

A strong component is a maximal strongly connected subnetwork.

v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component


Example Strong Components

1. Net > Components > {Strong, Weak}


Cliques and Complete Subnetworks

A clique is a maximal complete subnetwork containing three vertices or more. (cliques can overlap)

v1,v6,v5 is a clique

v2,v4,v5 is not a clique

v2,v3,v4,v5 is a clique


n-Clique & n-Clan

2-Clique

2-Clan

n-Clique: Is a maximal complete subgraph, in the analyzed graph, each node has maximally the distance n. A Clique is a n-Clique with n=1.

n-Clan: Ist a maximal complete subgraph, where each node has maximally the distance n in the resulting graph


n-Clans & n-Cliques

2-Clans: 123,234,345,456,561,612

2-Cliques: 123,234,345,456,561,612 and 135,246

1

2

5 6

4

3


k-Plexes

2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123

In general k-Plexes are more robust than Cliques und Clans.

1

2

5 6

4

3

k-Plex: A k-Plex is a maximal complete subgraph with gs nodes, in which each node has at least connections with gs-k nodes.


Overview Subgroups

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

2 Components 1 Component 2 2-Clans (341,412) 2 2-Cliques (341,412)

1 Component 1 2-Clans (124) 1 2-Clique (124)

1 Component 1 2-Clan (1234) 1 2-Clique (1234) 1 2-Plex (1234)

1 Component 1 2-Clan (1234) 1 2-Clique (1234) 1 2-Plex (1234) 1 Clique


Overview Groupconcepts

• 1-Clique, 1-Clan und 1-Plex are identical

• A n-Clan is always included in a higher order n-Clique

Component

2-Clique

2-Clan

2-Plex

Clique


k-Cores

• Net > Components > {Strong, Weak} A k-core is a maximal subnetwork in which each vertex has at least degree k within the subnetwork.


k-Cores

k-cores are nested which means that a vertex in a 3-core is also part of a 2-core but not all members of a 2-core belong to a 3-core.


k-Cores Application

• K-cores help to detect cohesive subgroups by removing the lowes k-cores from the network until the network breaks up into relatively dense components.

• Net > Partitions > Core >{Input, Output, All}


3. Network Mechanisms


Network Mechanisms

• Tie Outdegree Effect

• Reciprocity

• Transitivity

& Three-Cycles Effect

• Balance Effect

• In/Out Popularity Effect

• In/Out Activity Effect

• In/Out Assortativity Effect

• Covariate Similarity Effect

• Covariate Ego-Effect

• Covariate Alter-Effect

• Same Covariate Effect


Outdegree Effect

• The most basic effect is defined by the outdegree of actor i. It represents the basic tendency to have ties at all,

• In a decision-theoretic approach this effect can be regarded as the balance of benefits and costs of an arbitrary tie. • Most networks are sparse (i.e., they have a density well below 0.5)

which can be represented by saying that for a tie to an arbitrary other actor – arbitrary meaning here that the other actor has no characteristics or tie pattern making him/her especially attractive to i –, the costs will usually outweigh the benefits. Indeed, in most cases a negative parameter is obtained for the outdegree effect.


Reciprocity Effect

• Another quite basic effect is the tendency toward reciprocity, represented by the number of reciprocated ties of actor i. This is a basic feature of most social networks (cf. Wasserman and Faust, 1994, Chapter 13)

i j


Transitivity and other triadic effects

• Next to reciprocity, an essential feature in most social networks is the tendency toward transitivity, or transitive closure (sometimes called clustering): friends of friends become friends, or in graph-theoretic terminology: two-paths tend to be, or to become, closed (e.g., Davis 1970, Holland and Leinhardt 1971).

Transitive triplet Three cycle

i

j

h

i

j

h


Balance Effect

• An effect closely related to transitivity is balance (Newcomb, 1962), which is the same as structural equivalence with respect to out-ties (Burt, 1982), is the tendency to have and create ties to other actors who make the same choices as ego.

A D

B C


In/Out Popularity Effect • The degree-related popularity effect is based on indegree or

outdegree of an actor. Nodes with higher indegree, or higher outdegree, are more attractive for others to send a tie to.

• That implies that high indegrees reinforce themselves, which will lead to a relatively high dispersion of the indegrees (a Matthew effect in popularity as measured by indegrees, cf. Merton, 1968 and Price, 1976).

A

D B C


In/Out Activity Effect • Nodes with higher indegree, or higher outdegree respectively,

will have an extra propensity to form ties to others.

• The outdegree-related activity effect again is a self-reinforcing effect: when it has a positive parameter, the dispersion of outdegrees will tend to increase over time, or to be sustained if it already is high.

A

D B C


Preferential Attachment

• Notice: These four degree-related effects can be regarded as the analogues in the case of directed relations of what was called cumulative advantage by Price (1976) and preferential attachment by Barabasi and Albert (1999) in their models for dynamics of non-directed networks: a self-reinforcing process of degree differentiation.


In/Out Assortativity Effect • Preferences of actors dependent on their degrees. Depending

on their own out- and in-degrees, actors can have differential preferences for ties to others with also high or low out- and in-degrees (Morris and Kretzschmar 1995; Newman 2002)

E B C

A D

F


Covariate Similarity Effect • The covariate similarity effect, describes whether ties tend to

occur more often between actors with similar values on a value (homophily effect). Tendencies to homophily constitute a fundamental characteristic of many social relations, see McPherson, Smith-Lovin, and Cook (2001).

• Example: Ipad Owners tend to be friends with other Ipad owners.


Covariate Ego Effect

• The covariate ego effect, describes that actors with higher values on a covariate tend to nominate more friends and hence have a higher outdegree.

• Example: Heavier smokers have more friends.


Covariate Alter Effect

• The alter effect describes whether actors with higher V values will tend to be nominated by more others and hence have higher indegrees.

• Example: Beautiful people have more friends.


Modeling networks

1. Actor Based modeling for longitudonal data – SIENA (analysis of repeated measures on social networks and MCMC-

estimation of exponential random graphs)

2. Stochastic modeling for panels – Pnet

objective function Model 1 Model 2 Model3

estim s.e. p estim s.e p estim

s.e. p

outdegree (density) -2,46 0,12 <0,0001* -4,04 0,23 <0,0001* -1,99 0,13 <0,0001*

reciprocity 2,57 0,20 <0,0001* 2,29 0,22 <0,0001* 3,02 0,21 <0,0001*

transitive triplets 0,07 0,01 <0,0001*

transitive mediated triplets -0,03 0,01 0,0005*

transitive ties 1,47 0,24 <0,0001*

3-cycles -0,06 0,02 0,0037*

attribute party 1,13 0,15 <0,0001* 0,73 0,15 <0,0001*

attribute gender -0,11 0,15 0.48

3. Network Theories

Homophily & Assortativity Power Laws & Preferential Attachment The Strength of Weak Ties Small Worlds Social Capital

3.1Homophily


Homophily

• Homophily (i.e., love of the same) is the tendency of individuals to associate and bond with similar others. (Mechanisms of selection vs influence)

• In the study of networks, assortative mixing is a bias in favor of connections between network nodes with similar characteristics. In the specific case of social networks, assortative mixing is also known as homophily. The rarer disassortative mixing is a bias in favor of connections between dissimilar nodes.

Low Homophily High Homophily


Homophily II

Types (acc. to McPherson et. Al 2001): – Race and Ethnicity (Marsden 1987, 88| Louch 2000, Kalleberg et al

1996, Laumann 1973…)

– Sex and Gender (Maccoby 1998, Eder & Hallinan 1978, Shrum et al 1988, Huckfeldt & Sprague 1995, Brass 1985 …)

– Age (Fischer 1977,82, Feld 1982, Blau et Al 1991, Burt 1990,91…)

– Religion (Laumann 1973, Verbrugge 1977, Fischer 1977,82, Marsden 1988, Louch 2000…)

– Education, Occupation and Social Class (Laumann 1973, Marsden 1987, Verbrugge 1977, Wright 1997, Kalmijn 1998…)

– Network Positions (Brass 1985, Burt 1982, Friedkin 1993…)

– Behavior (Cohen 1977, Kandel 1978, Knocke 1990…)

– Attitudes, Abilities, Beliefs and Aspirations (Jussim & Osgood 1989, Huckfeldt & Sprague 1995, Verbrugge 1977,83, Knocke 1990)


Schellings Segregation Demo

http://ccl.northwestern.edu/netlogo/models/run.cgi?Segregation.734.460

3.2 Power Laws & Preferential Attachment


Power Law distribution

• As a function of k, what fraction of pages on the Web have k in-links?

• A natural guess the normal, or Gaussian, distribution

• Central Limit Theorem (roughly): if we take any sequence of small independent random quantities, then in the limit their sum will be distributed according to the normal distribution


Power Law distribution

But when people measured the Web, they found something

very different: The fraction of Web pages that have k in-links is approximately

proportional to 1/k^2

• Power law function

• Popularity exhibits extreme imbalances: there are few very popular Web pages that have extremely many in-links

True for other domains:

• the fraction of telephone numbers that receive k calls per day: 1/k^2

• the fraction of books bought by k people: 1/k^3

• the fraction of scientific papers that receive k citations: 1/k^3


Preferential attachment leads to power laws • A preferential attachment process is any of a class of

processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. Notice: "Preferential attachment" (A.L. Barabasi and R.Albert 1999) is only the most recent of many names that have been given to such processes.

• Notice: Preferential attachment can, under suitable circumstances, generate power law distributions.


Preferential Attachment Demo

DEMO with NETLOGO

http://ccl.northwestern.edu/netlogo/models/run.cgi?PreferentialAttachment.836.527

http://ccl.northwestern.edu/netlogo/models/run.cgi?PreferentialAttachment.836.527

3.3 Balance Theory


Balance Theory Franz Heider Franz Heider (1940): A person (P) feels uncomfortable whe he ore she disagrees with his ore her friend(O) on a topic (X).

P feels an urge to change this imbalance. He can adjust his opinion, change his affection for O, or convince himself that O is not really opposed to X.


Balance Theory

(a) + + + : three people are mutual friends

(c) - - + : two people are friends, and they have mutual enemy in the third

(b) + + - : A is a friend with B and C; but B and C – enemies

(d) - - - : all enemies; motivates two of them to “team up” against the third

b and d represent unstable relationship


Balance Theory Community in a New England Monastery

Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)


Balance Theory International Relations

3.4 Strength of weak ties


Strength of Weak Ties Mark Granovetter • “One of the most influential sociology papers ever

written” (Barabasi) – One of the most cited (Current Contents, 1986)

• Accepted by the American Journal of Sociology after 4 years of unsuccessful attempts elsewhere.

• Interviewed people and asked: “How did you find your job?” – Kept getting the same answer: “through an acquaintance,

not a friend”


Basic Argument

• Classify interpersonal relations as “strong”, “weak”, or “absent”

• Strength is (vaguely) defined as “a (probably linear) combination of… – the amount of time,

– the emotional intensity,

– the intimacy (mutual confiding),

– and the reciprocal services which characterize the tie

• The stronger the tie between two individuals, the larger the proportion of people to which they are both tied (weakly or strongly)


Strong Ties

• If person A has a strong tie to both B and C, then it is unlikely for B and C not to share a tie.

B C

A


Weak Ties for Information Diffusion

„Intuitively speaking, this means that whatever is to be diffused can reach a larger number of people, and traverse greater social distance, when passed through weak ties rather than strong.“

3.4 Small World Phenomenon


Connectivity and the Small World

1. Travers and Milgram’s work on the small world is responsible for the standard belief that “everyone is connected by a chain of about 6 steps.”

2. Two questions: – Given what we know about networks, what is the longest path (defined

by handshakes) that separates any two people?

– Is 6 steps a long distance or a short distance?


Example: Two Hermits on opposite sites of the country

OH Hermit

Mt. Hermit

Store Owner

Store Owner

Truck Driver

Truck Driver

Manager

Manager

Corporate Manager

Corporate Manager

Corporate President

Corporate President

Congress Rep.

Congress Rep.


Milgrams Test

Milgram’s test: Send a packet from sets of randomly selected people to a stockbroker in Boston.

Experimental Setup: Arbitrarily select people from 3 pools: – People in Boston

– Random in Nebraska

– Stockholders in Nebraska


Results

• Most chains found their way through a small number of intermediaries.

• What do these two findings tell us of the global structure of social relations?


Results II

1. Social networks contains a lot of short paths

2. People acting without any sort of global ‘map’ are effective at collectively finding these short paths


The Watts-Strogatz model

• Two main principles explaining short paths: homophily and weak ties: • Homophily: every node forms a link to all other nodes that lie within a

radius of r grid steps

• Weak ties: each nodes forms a link to k other random nodes

• Suppose, everyone lives on a two-dimensional grid (as a model of geographic proximity)


Watts-Strogatz


The Watts-Strogatz model

• Suppose, we only allow one out of k nodes to a to have a single random friend

• k * k square has k random links - consider it as a single node

• Surprising small amount of randomness is enough to make the world “small” with short paths between every pair of nodes


Decentralized Search

• People are able to collectively find short paths to the designated target while they don’t know the global ‘map’ of all connections

• Breadth-first search vs. tunneling

• Modeling: – Can we construct a network where decentralized search succeeds?

– If yes, what are the qualitative properties of such a network?


A model for decentralized search

• A starting node s is given a message that it must forward to a target node t

• s knows only the location of t on the grid, but s doesn’t know the edges out of any other node

• Model must span all the intermediate ranges of scale as well


Modeling the process of decentralized search

• We adapt the model by introducing clustering exponent q • For two nodes v and w, d(v,w) - the number of steps between them

• Random edges now generated with probability proportional to d(v,w)-q

• Model changes with different values q: – q=0 : links are chosen uniformly at random

– when q is very small : long-range links are “too random”

– when q is large: long-range links are “not random enough”


Varying clustering exponent


Decentralized Search when q=2

Experiments show that decentralized search is more efficient when q=2 (random links follow inverse-square distribution)


What’s special about q=2

• Since area in the plane grows like the square of the radius, the total number of nodes in this group is proportional to d2

• the probability that a random edge links into some node in this ring is approximately independent of the value of d.

• long-range weak ties are being formed in a way that’s spread roughly uniformly over all different scales of resolution

Think of the postal system: country, state, city, street, and finally the street number


Small-World Phenomenon Conclusions I 1. Start from a Milgram’s experiment: (1) seems there are short

paths and (2) people know how to find them effectively

2. Build mathematical models for (1) and (2)

3. Make a prediction based on the models: clustering exponent q=2

4. Validate this prediction using real data from large social networks (LiveJournal, Facebook)

Why do social networks arrange themselves in a pattern of friendships across distance that is close to optimal for forwarding messages to far-off targets?


Small-World Phenomenon Conclusions II • If there are dynamic forces or selective pressures driving the

network toward this shape, they must be more implicit, and it remains a fascinating open problem to determine whether such forces exist and how they might operate.

• Robustness, Search, Spread of disease, opinion formation, spread of computer viruses, gossip,…

• For example: Diseases move more slowly in highly clustered graphs

• The dynamics are very non-linear -- with no clear pattern based on local connectivity.

Implication: small local changes (shortcuts) can have dramatic global outcomes (think of disease diffusion)


Small World Construction

• Network changes from structured to random

• Given 6 Billion Nodes L starts at 3 million, decreases to 4 (!)

• Clustering: starts at 0.75, decreases to zero

• Most important is what happens ALONG the way.


Small worlds demo

http://www-personal.umich.edu/~ladamic/netlearn/NetLogo412/SmallWorldDiffusionSIS.html


Interactive Summary The biggest advantage I can gain by using SNA is… The most important fact about SNA for me is… The concept that made the most sense for me today was… The biggest danger in using SNA is … If I will use SNA in the future, I will try to make sure that… If I use SNA in my next project I will use it for … I should change my perspective on networks in considering … I have changed my opinion about SNA , finding out that… I missed today that … Before attending that seminar I didn't know that … I wish we could have covered… If I forget mostly everything that learned today, I will still remember … The most important thing today for me was …

Thanks for your attention!

Questions & Discussion