Exploratory social network analysis with pajek

Post on 11-May-2015

2,927 views 4 download

Tags:

description

Exploratory social network analysis with pajek

transcript

Exploratory Social Network Analysis with Pajek

Fundamentals in Social Network Analysis Fundamentals in Social Network Analysis by Wouter de Nooy, Andres Mrvar andVladimir Batagelj

Slides created byThomas Plotkowiak

26.08.2010

Agenda

1. Fundamentals

2. Attributes and Relations

3. Cohesion

4. Sentiments and Friendship

5. Affiliations5. Affiliations

6. Core - Periphery

1 - Fundamentals1 - Fundamentals

Fundamentals

SociometrySociometrySociometrySociometry studies interpersonal relations. Society is not an

aggregate of individuals and the characteristics (as statisticians

assume) but a structure of interpersonal ties. Therefore, the

individual is not the basic social unit. The social atom consists

of an individual and his or her social, economic, or cultural ties.

Social atoms are linked into groups, and , ultimately, society Social atoms are linked into groups, and , ultimately, society

consists of interrelated groups.

Example of a Sociogram

Choices of twenty-six girls living in one dormitory at a New York state training school.

The girls were asked to choose the girls they liked best as their dining-table partners.

The mainmainmainmain goalgoalgoalgoal of social network analysis is detecting and

interpreting patterns of social ties among actors.

Exploratory Social Network Analysis

It consists of four parts:It consists of four parts:

1. The definition of a network

2. Network manipulation

3. Determination of structural features

4. Visual inspection

Network Definition

A graphgraphgraphgraph is a set of vertices and a set of lines between pairs of

vertices.

A vertexvertexvertexvertex is the smallest unit in a network. In SNA it represents

an actor (girl, organization, country…)an actor (girl, organization, country…)

A linelinelineline is a tie between two vertices in a network. In SNA it can

be any social relation.

A looplooplooploop is a special kind of line, namely, a line that connects a

vertex to itself.

Network Definition II

A directeddirecteddirecteddirected linslinslinslins is called an arcarcarcarc.... Whereas an undirected line is an

edge.

A directeddirecteddirecteddirected graphgraphgraphgraph or digraphdigraphdigraphdigraph contains one or more arcs. An

undirectedundirectedundirectedundirected graphgraphgraphgraph contains no arcs (all of its lines are edgesedgesedgesedges).undirectedundirectedundirectedundirected graphgraphgraphgraph contains no arcs (all of its lines are edgesedgesedgesedges).

A simple undirected graphsimple undirected graphsimple undirected graphsimple undirected graph contains neither multiple edges nor

loops.

A simplesimplesimplesimple directeddirecteddirecteddirected graphgraphgraphgraph contains no multiple arcs.

Network Definition III

A networknetworknetworknetwork consists of a graph and additional information on the

vertices or the lines of the graph.

Application

1. We use the computer program Pajek – Slovenian for spider –

to analyzed and draw social networks. (get it from

http://vlado.fmf.uni-lj.si/pub/networks/)

Number of vertices

Specific vertex and orientation

List of Arcs

Pajek Main Screen

Manipulation

Suppose we want to change reciprocated choices in the dining-

table partners network into edges.

Calculation

Suppose we want to calculate the total number of lines:

Visualization

Automatic Drawing

• Layout by Energy: Move vertices to locations that minimize

the variation in line length. ( Imagine that the lines are springs

pulling vertices together, though never too close)

• Energy Layouts:

– Kamada-Kawai (computationally expensive)– Kamada-Kawai (computationally expensive)

– Fruchtemann Reingold (faster)

• Draw by Hand

Exporting

VRML, MDL, Kinemages

Bitmap

SVG, EPS

2 - Attributes and Relations2 - Attributes and Relations

Example – The world system

In 1974,ImmanuelWallerstein introduced the concept of a

capitalist world systemcapitalist world systemcapitalist world systemcapitalist world system, which came into existence in the

sixteenth century. This system is characterized by a world economy that is stratified into a core, a semiperiphery, and a

periphery. Countries owe their wealth or poverty to their

positionin the world economy. The core,Wallerstein argues, positionin the world economy. The core,Wallerstein argues,

exists because it succeeds in exploiting the periphery and, to a lesser extent, the semiperiphery.The semiperiphery profi ts

from being an intermediary between the coreand the

periphery.

�Which countries belong to the core, semiperiphery or

periphery?

The world system network

• Network contains 80 countries with attributes:

• continent

• world system position in 1980

• gross domestic product per capita in U.S. dollars in 1995

• The arcs represent imports (of metal) into one country from

another.another.

Partition

A partitionpar titionpar titionpar tition of a network is a classification or clustering of the

vertices in the network such that each vertex is assigned to

exactly one class or cluster.

Partition Load & Edit

• File > Partition > Read (.clu File)

• File> Partition> Edit

Info on Partition Distribution

• Info > Partition

Application – Draw Partition

• Draw > Draw Partition

Reduction of a Network

• Operations > Extract from Network (select class 6)

To extractextractextractextract a subnetworksubnetworksubnetworksubnetwork from a network, select a subset of its

vertices and all lines that are only incident with the selected

vertices.

Partition – Local View

1. Partitions > Extract Second from First

Global View

1. Operations > Shrink Network

To shrinkshrinkshrinkshrink a network, replace a subset of its vertices by one new

vertex that is incident to all lines that were incident with the

vertices of the subset in the original network.

Contextual View

• Operations > Shrink Network (Don't shrink class 6)

In a contextualcontextualcontextualcontextual view,view,view,view, all classes are shunk except the one in

which you are particularly interested.

Vectors and Coordinates Load & Edit

• File > Vector > Read (.vec File)

• File> Vector > Edit

A vectorvectorvectorvector assigns a numerical value to each vertex in a network.

Info on a Vector

• Info > Vector

Vector � Partition

• Vector > Make Partition > by Truncating (Abs)

• Vector > Make Partition by Intervals > First Threshold and

Step

• Vector > Make Partition by Intervals >Selected Thresholds

Draw Vector & Partition

Global View & Vectors

1. Vector > Shrink Vector (Sum)

Network Analysis and Statistics

• Example: Crosstabulation of two partitions and some

measures of association between the classifications

represented by two partitions.

• Partition > Info > Cramer's , Rajski

• Cramer's V measures the statistical dependence between two

classifications. classifications.

• Rajski's indices measure the degree to which the information in one

classification is preserved in the other classification.

END OF LESSON 1END OF LESSON 1

3 - Cohesion3 - Cohesion

Cohesive Subgroups

Cohesive subgroups: We hypothesize that cohesive subgroups

are the basis for solidarity, shared norms, identity and

collective behavior. Perceived similarity, for instance,

membership of a social group, is expected to promote

interaction. We expect similar people to interact a lot, at least

more often than with dissimilar people. This peonomenon is more often than with dissimilar people. This peonomenon is

called homophily: "Birds of a feather flock together."Birds of a feather flock together."Birds of a feather flock together."Birds of a feather flock together."

Example – Families in Haciendas (1948)

Each arc represents "frequent visits" from one family to another.

Density & Degree I

DensityDensityDensityDensity is the number of lines in a simple network, expressed as

a proportion of the maximum possible number of lines.

A completecompletecompletecomplete networknetworknetworknetwork is a network with maximum density.

The degreedegreedegreedegree of a vertex is the number of lines incident with it.

Density & Degree II

Two vertices are adjacentadjacentadjacentadjacent if they are connected by a line.

The indegreeindegreeindegreeindegree of a vertex is the number of arcs it receives.

The outdegreeoutdegreeoutdegreeoutdegree is the number of arcs it sends.The outdegreeoutdegreeoutdegreeoutdegree is the number of arcs it sends.

To symmetrizesymmetrizesymmetrizesymmetrize a directed network is to replace unilateral and

bidirectional arcs by edges.

Computing Density

• Info > Network > General

Computing Degree

• Net > Transform > Arcs � Edges > All

• Net > Partitions > Degree > {In, Out, All}

Components

A semiwalksemiwalksemiwalksemiwalk from vertex u to vertex v is a sequence of lines such

that the end vertex of one line is the starting vertex of the next

line and the sequence starts at vertex u and end at vertex v.

A walkwalkwalkwalk is a semiwalk with the additional condition that none of

its lines are an arc of which the end vertex is the arc's tailits lines are an arc of which the end vertex is the arc's tail

Note that v5�v3� v4�v5�v3

is also a walk to v3

Paths

A semipathsemipathsemipathsemipath is a semiwalk in which no vertex in between the first

and last vertex of the semiwalk occurs more than once.

A pathpathpathpath is a walk in which no vertex in between the first and last

vertex of the walk occurs more than once.vertex of the walk occurs more than once.

Connectedness

A network is (weakly)(weakly)(weakly)(weakly) connectedconnectedconnectedconnected if each pair of vertices is

connected by a semipath.

A network is stronglystronglystronglystrongly connectedconnectedconnectedconnected if each pair of vertices is

connected by a path.connected by a path.

This network is not connected

because v2 is isolated.

Connected Components

A (weak)(weak)(weak)(weak) componentcomponentcomponentcomponent is a maximal (weakly) connected

subnetwork.

A strongstrongstrongstrong componentcomponentcomponentcomponent is a maximal strongly connected

subnetwork.subnetwork.

v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component

Example Strong Components

1. Net > Components > {Strong, Weak}

Cliques and Complete Subnetworks

A cliquecliquecliqueclique is a maximal complete subnetwork containing three

vertices or more. (cliques can overlap)

v1,v6,v5 is a clique

v2,v4,v5 is not a clique

v2,v3,v4,v5 is a clique

n-Clique & n-Clan

n-Clique: Is a maximal complete subgraph, in the analyzed graph,

each vertex has maximally the distance n. A Clique is a n-Clique

with n=1.

n-Clan: Ist a maximal complete subgraph, where each vertex has

maximally the distance n in the resulting graph

2-Clique

2-Clan

maximally the distance n in the resulting graph

n-Clans & n-Cliques

1

2

56

4

3

2-Clans: 123,234,345,456,561,612

2-Cliquen: 123,234,345,456,561,612 andandandand 135,246135,246135,246135,246

k-Plexes

56

k-Plex: A k-Plex is a maximal complete subgraph with gs Vertext,

in which each vertex has at least connections with gs-k vertices.

2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123In general k-Plexes are more robust than Cliques und Clans.

1

2

56

4

3

Overview Subgroups

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

1 2

4 3

2 Components 1 Component

2 2-Clans (341,412)

2 2-Cliques (341,412)

1 Component

1 2-Clans (124)

1 2-Clique (124)

1 Component

1 2-Clan (1234)

1 2-Clique (1234)

1 2-Plex (1234)

1 Component

1 2-Clan (1234)

1 2-Clique (1234)

1 2-Plex (1234)

1 Clique

Overview Groupconcepts

• 1-Clique, 1-Clan und 1-Plex are identical

• A n-Clan is always included in a higher order n-Clique

Component

2-Clique

2-Clan

2-Plex

Clique

Finding Cliques

• Example: We are looking for occurences of triads

• Nets > First Network, Second Network

• Nets > Fragment (1 in 2 ) > Find

The figure shows the hierarchy for the example of overlapping complete triads. There are five complete triads; each of the triads

is represented by a gray vertex. Each triad consits of three vertices.

Finding Social Circles

• Partitions > First, Second

• Partitions > Extract Second from First

We have found three social circles.

k-Cores

• Net > Components > {Strong, Weak}A kkkk----corecorecorecore is a maximal subnetwork in which each vertex has at

least degree k within the subnetwork.

k-Cores

k-cores are nestednestednestednested which means that a vertex in a 3-core is also

part of a 2-core but not all members of a 2-core belong to a 3-

core.

k-Cores Application

• K-cores help to detect cohesive subgroups by removing the

lowes k-cores from the network until the network breaks up

into relatively dense components.

• Net > Partitions > Core >{Input, Output, All}

4 - Sentiments and Friendship4 - Sentiments and Friendship

Balance Theory

Franz Heider (1940): A person (P) feels uncomfortable whe he

ore she disagrees with his ore her friend(O) on a topic (X).

P feels an urge to change this imbalance. He can adjust his

opinion, change his affection for O, or convince himself that O is

not really opposed to X.

Signed Graphs

{O,P,X} form a cycle. All balanced cycles contain an even number

of negative lines or no negative lines at all.

A signedsignedsignedsigned graphgraphgraphgraph is a graph in which each line carries either a

positive or a negative sign.

Signed Graphs with Arcs

A cyclecyclecyclecycle is a closed path.

A semicyclesemicyclesemicyclesemicycle is a closed semipath.

A (semi-)cycle is balancedbalancedbalancedbalanced if it does not contain an uneven

number of negative arcs.

Balanced Networks

A signed graph is balancedbalancedbalancedbalanced if all of its (semi-)cycles are balanced.

A signed graph is balancedbalancedbalancedbalanced if it can be partitioned into two

clusters such that all positive ties are contained within the clusters

and all negative ties are situated between the clusters.

Clusterability = Generalized Balance

A cycle or a semicycle is clusterableclusterableclusterableclusterable if it does not contain exactly

one negative arc.

A signed graph is clusterableclusterableclusterableclusterable if it can be partitioned into clusters

such that all positive ties are contained within clusters and all

negative ties are situated between clusters.

Example – Community in a New England

monastery

Options > Values of Lines > Similaritiies

Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)

Issues on Clustering

1. An optimization may find several solutions that fit equally well.

It is up to the researcher to select one or present all.

2. There is no guarantee that there is not a better solution than

the found one, unless it is optimal.

3. Different starting options yield different results. (It is hard to

tell the exact number of clusters that will yield the lowest tell the exact number of clusters that will yield the lowest

error score)

4. Negative ars are often tolerated less in a cluster than positive

arcs between clusters.

Calculating Clustering

1. Partition > Create Random Partition (ex. 3 Clusters)

2. Operations > Balance (alpha 0.5)

Development in Time

List of Vertices with their presence

List of Arcs with their presence

Loading and Drawing Networks in Time

• Net > Transform > Generate in Time

• Draw > {Previous, Next}

Network consists of 3 choices, hence the bigger errors.

We see a tendency towards clusterability.

BREAK LESSON 2BREAK LESSON 2

5 - Affiliations5 - Affiliations

Example – Corporate interlocks in Scotland in

the beginning of the twentieth century (1904-5)

A fragment of the Scottish directorates network. Companies are classified according t

oil & mining, railway, engineering...

Directors (grey) and Firms (black)

Two-Mode and One-Mode Networks

In a oneoneoneone modemodemodemode networknetworknetworknetwork, each vertex can be related to each

other vertex.

In a twotwotwotwo----modemodemodemode network,network,network,network, vertices are divided into two sets and

vertices can only be related to vertices in the other set.

• The degree of a firm specifies the number of its multiple

directors, also known as size of an event.size of an event.size of an event.size of an event.

• The degree of a director equals the number of boards he sits

on, also known as the rate of par ticipation of an actor. par ticipation of an actor. par ticipation of an actor. par ticipation of an actor.

• Also note that some measures must be computed Also note that some measures must be computed Also note that some measures must be computed Also note that some measures must be computed

differently for twodifferently for twodifferently for twodifferently for two----node networks.node networks.node networks.node networks.

vertices can only be related to vertices in the other set.

Transforming two-mode networks into one-

mode networks

Whenever two firms share a director in the two-mode network, there is a line between

them in the one-mode network.

Transforming two-mode networks into one-

mode networks II

The events of the two-mode networks are represented by lines and loops in the one-

mode network of actors. J.S.T ait meets W. Sanderson in board meetings of two

companies.

Transforming two-mode networks into one-

mode networks III

• Net > Transform > 2-Mode to 1-Mode {Rows,Columns}

• Net > Transform > 2-Mode to 1-Mode > Include Loops,

Multiple Lines

• Info > Network > Line Values

m-Slices

An mmmm----sliceslicesliceslice is a maximal subnetwork containing the lines with a

multiplicity equal to or greater than m and the vertices incident

with these lines.

2-Slice in the network of Scottish firms

Computing m-Slices

• Net > Partitions > Values Core {use max}

• Net > Partitions > Values Core > First Threshold and Step

• Net > Transform > Remove >lines with value > lower than 2

• Operations > Extract from Network > Partition

• Net > Components > Weak• Net > Components > Weak

3-slice

m-slices 3D

• Layers > Type of Layout >

3D

• Layers > In z direction

• Options> Scroll Bar > On

Drawing m-Slices

1. Layers > Type of Layout > 3D

2. Layers > In z direction

3. Options> Scroll Bar >

6 – Center and Periphery6 – Center and Periphery

//Slides need to be translated

//Input from book 2 needed.

Example – Communication ties within a sawmill

H – Hispanic

E – English

M- Mill

P – Planer section

Y -Yard

Vertex labels indicate the ethnicity and the type of work of each employee, for example

HP-10 is an Hispanic (H) working in the planer section (P)

Distance

• The larger the number of sources accessible to a person, the

easier it is to obtain information. Social ties constitute a social

capital that may be used to mobilize social resources.

• The simples indicator of centrality is the number of its

neighbors (degree in a simple undirected network)

Degree centrality I

The degreedegreedegreedegree centralitycentralitycentralitycentrality of a vertex is its degree.

DegreeDegreeDegreeDegree centralizationcentralizationcentralizationcentralization ofofofof aaaa networknetworknetworknetwork iiiis the variation in the

degrees of vertices divided by the maximum degree variation

which is possible in a networks of the same size.

Degree Centrality II

1. //TODO missing…

Closeness Centrality

1. Closeness centrality : Eine Person ist dann zentral, wenn sie

bezüglich der Netzwerkrelation sehr nah bei allen anderen

Liegt. Eine solche zentrale Lage steigert die Effizienz, mit der

ein Akteur im Netzwerk agieren kann. Ein solcher Akteur kann

Informationen schnell schnell schnell schnell empfangen und verbreiten.

1

1( )

( , )c i g

i jj

gC n

d n n=

−=∑

Closeness Centrality

1

2

3

4

5

6

7

8 9

10

11

3

11 1( ) 0, 43

23cC n−= =

nininini njnjnjnj dddd

3 1 1

3 2 1

3 4 1

3 5 1

3 6 2

3 7 2

3 8 3

3 9 4

3 10 5

3 11 5

23

nnnn CcCcCcCc

1 0,27

2 0,29

3333 0,400,400,400,40

4 0,45

5 0,45

6 0,45

7 0,45

8 0,45

9 0,37

10 0,27

Achtung: Hier wurde können nur

symmetrische Verbindungen symmetrische Verbindungen symmetrische Verbindungen symmetrische Verbindungen

betrachtet werden und nur nur nur nur

verbundene Netzeverbundene Netzeverbundene Netzeverbundene Netze.

Zentralisierung

1. Zentralisierung =! Zentralität

2. Zentralisierung ist eine strukturelle Eigenschaft der Gruppe

und nicht der relationalen Eigenschaft einzelner Akteure.

3. Index für Zentralisierung: Man berechnet die Differenzen

zwischen der Zentralität des zentralsten Akteurs und der

Zentralität aller Anderen. Man summiert dann diese diff. über Zentralität aller Anderen. Man summiert dann diese diff. über

alle anderen Akteure.

1

( *) ( )

1

g

ii

C n C nC

g=

−=

Zentralisierung

1. Dieser weißt nur dann einen hohen Wert auf wenn genau ein Akteur zentral ist, und nicht mehrere Akteure ein Zentrum

bilden.

2. Nur der Vergleich von Daten einer Gruppe zwischen

mehreren Zeitpunkten erlaubt sinnvoll interpretierbare

Aussagen.Aussagen.

Betweenness centrality

1. Betweenness Centrality: Personen (Cutpoints), die zwei die

ansonsten unverbundene Teilpopulationen miteinander

verbinden, sind Akteure mit einer hohen betweennesscentrality. (Annahme: man nutzt nur die kürzesten Verbindungen zur Kommunikation)

2. Indem man für jedes Paar von Akteuren j, k != i unter allen kürzesten Pfaden, die j un k verbinden , den Anteil von Pfaden

bestimmt die über Akteur i laufen. Anschließens müssen diese Anteile über alle Paare j, k != i gemittelt werden.

,

( )

( )( 1)( 2)

jk i jkj k

i j kb i

g n g

C ng g

≠≠=

− −

Betweenness centrality

1. Achtung: Es ist möglich das einige Akteure zwar nicht

erreichbar sind, selbst aber die anderen von sich aus erreichen

können.

1

3

4 6

8 9

10

2

3

5 7

8 9

11

1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010 11111111

0 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0

Degree Prestige

1. Prestige lässt sich sinnvoll messen als relativer Innengrad dieses

Akteurs (degreedegreedegreedegree prestigeprestigeprestigeprestige) [Wasserman und K. Faust 1994:

202]

( ) / ( 1)P n x g= −

Akteur j Matrix-Eintrag Zeile i, Spalte j

Anzahl Knoten im Netzwerk

( ) / ( 1)d j jP n x g+= −

jn ijx

ijxj ij

i

x x+ =∑

Prestige Beispiel

1

2

3

4

5

6

7

8 9

10

11

3 3( ) 2 / (11 1) 0,2dP n += − =Prestige von Knoten 3:

Generell: Prestige ist unabhängig von der Gruppengröße und sein Wert liegt zwischen

0 und 1 (Stern).