1
Blog Community Discovery and Evolution
Mutual Awareness, Interactions and Community Stories
Yu-Ru Lin, Hari Sundaram,
Yun Chi, Junichi Tatemura and Belle Tseng
@Web Intelligence 2007 April 22, 2008 2
What do people feel about Hurricane Katrina?
What do people think about global warming?
What is the best school district in Manhattan?
How do teenagers like the movie transformer?
Jun
Jul
Au
gS
ep
expert
reviewers
well-heeled
enthusiasts
amateurs
semi-professional
shooters
2
� Goal: extract query-sensitive communities and
their dynamics from blog networks
• What is a community?
• How does a community form?
• How does a community change?
� Approach:
• Observation: mutual awareness
• Community discovery using iterative
clustering
• Community dynamics via temporal
correlation April 22, 2008 3
Research Scope
@Web Intelligence 2007
Result: communities for query of hurricane Katrina
Talk Outline
Motivation and Goal
Community Discovery
Community Dynamics
Experiments
Summary and Conclusions
Related Work
3
Related Work
Online social
network
dynamics
Graph clustering
Community
evolution
Bursty evolution of
links [Kumar 2005]Dynamic community
formation
[Backstrom 2006]
Normalized cut [Shi 2000]
Kernel k-means [Dhillon 2005]
Interactive spectral
clustering [Kanna 2004]
Quantify the social
group evolution [Palla
2007] [Falkowski 2006]
micro level (individual
communities) structural
and thematic changes
clustering criteria:
symmetric social distance
community correlation
based on member
interaction
our workprior work
Motivation and Goal
Related Work
Community Dynamics
Experiments
Summary and Conclusions
Community Discovery
@Web Intelligence 2007 April 22, 2008 6
•Mutual Awareness
•Community Formation
•Social Distance from Random Walk
•Extraction Algorithm
4
� Notions of community
• Virtual / online community [Rheingold 2000]: social aggregations
that emerge from the Net when enough people carry on those
public discussion long enough, with sufficient human feeling
• Virtual settlement [Jones 1997]: Interactivity, communicators,
virtual common-public-place, sustained membership
• Sense of community [McMillan 1996]: spirit, trust, trade and art
• Sense of community among blogs [Blanchard 2004]
• Focus theory [Feld 1981]
• …
• Mutual awareness [Dourish 2001]: presence and awareness
What is a community?
Communities emerge due to mutual awareness
Principle Insight: Mutual Awareness
Lisa
me
5
Community Formation
transitivity
reciprocity
frequency
Mutual awareness expansion
Lisa
meCommunity:
a group of people interacting with each other more closely than with others
mutually observable actions=
Social Distance
?
1 2 34
1
23456
7
Original six degrees of separation experiment
[Travers and Milgram 1969]
Expected symmetric social distance
E
Lisame
6
?
Social Distance from Random Walk
…
i
Pij hitting time:
expected hops from u to v
commute time:
expected hops from u to v and v to u
τu→v
τu↔v = τu→v + τu←v
…
transition matrix P=D-1W
W: mutually observable interactions
D: digonal matrix with dii=∑j wij
2
2
1( ) ( ( ) ( ))
k
u v i i
i i
vol W u vτ φ φλ↔
=
≈ −∑Laplancian matrix L=D(I-P)=D-W
vol(W)= ∑i,j wij
λk: the k-th smallest eigenvalue
φk: the k-th smallest eigenvector
Ref. [Chung 2000]
j
me
Lisa
u
v
Extraction Algorithm
Note: the k -th largest eigenvectors of P is equivalent
to the k-th smallest eigenvectors of L
Criteria:
expected symmetric social distance
\
argmax ( , ) ,u vS V u S
v V S
S S Vω τ ↔⊂ ∈
∈
= ∑
weighting for balanced splitsV S\V
a set of bloggers S
7
Motivation and Goal
Related Work
Community Discovery
Experiments
Summary and Conclusions
Community Dynamics
@Web Intelligence 2007 April 22, 2008 13
•Interaction based representation
•Interaction Correlation
•Evolutionary Patterns
Interaction based representation
•community behave differently due to members’ interaction
•members play different roles in a community
A B
A’ B'
?
[ ] , if ( ; , )
0, otherwise
ij i AA i j
∈=
Px
interaction matrix
for community AN bloggers
8
Interaction Correlation
interaction correlation
between community A and B’
( )
( )
1 1
1 1
min ( ; , ), ( '; , )
( , ')
max ( ; , ), ( '; , )
N N
i j
N N
i j
A i j B i j
s A B
A i j B i j
= =
= =
=∑∑
∑∑
x x
x x
histogram intersection
?B'
A
Evolutionary Patterns
Post(Ci )
Prior(Cj)
Cj Cj’
Ci
(c) split
Prior(Cj) = argmax s(Ci,Cj)
Post(Ci) = argmax s(Ci,Cj)
time
tt+
1
interaction correlation
9
Motivation and Goal
Related Work
Community Discovery
Community Dynamics
Summary and Conclusions
Experiments
@Web Intelligence 2007 April 22, 2008 17
•Experimental setup
•Evaluation Metrics
•Comparison with Baseline Methods
•Stories
� Dataset: Real world blogs
• 407 blogs during 63 consecutive weeks
(July 10, 2005 – September 23, 2006)
• 0.27M entries, 0.15M entry-entry links
� Query-sensitive graph
• Picked keywords related to four
significant events: “katrina”, “london
bomb”, “ipod nano”, “zotob worm”
Experimental setup
10
mj
Cj
time
C
Evaluation Metrics
Ew
Eb
Eo
coverage w
w b
E
E E=
+
conductancemin( , )
b
b w b o
E
E E E E=
+ +
cohesiveness
mij
Ci
pij = mij/mj
consistency
Ideal community extraction:
low conductance, high coverage, low entropy
Ew
Ew Eb
Eb
Eb EbEw Eoentropy
1
1( ) log
log
L
ij ij
i
H j p pL =
=− ∑
L: number of communities
Comparison with Baseline Methods
MAE outperforms baseline methods: lower conductance, higher coverage, lower entropy and relatively low variation
KKM: kernel k-means
SPEC: normalized cut
ICC: iterative conductance cutting
11
Stories
Hurricane Katrina
London bombing
iPod nano
Computer worm
Hurricane Katrina
right wingright wing
left wingleft wing
right wingright wing
left wingleft wing
8/23 Hurricane
Katrina forms
8/29 Leeve failure in
New Orleans
9/17 Hurricane Rita
forms
node size: community sizenode shade: query relevancy
communities
extracted at week 5
political communities
technical communities
12
London Bombing
political communities
technical communities
iPod nano
fan communitiesfan communities
13
Motivation and Goal
Related Work
Community Discovery
Community Dynamics
Experiments
Summary and Conclusions
@Web Intelligence 2007 April 22, 2008 25
� Queries involved costly decisions require contemplation on multiple viewpoints
� Community discovery: mutual awareness
• Extract communities using symmetric social distance
� Community dynamics: temporal correlation
• Extract evolutionary patterns using histogram intersection between interaction matrices
� Results:
• Outperforms baseline community detection methods
• Insightful results for community evolution
Summary
E
14
� Combining social aspect with graph theory help us
discover meaningful communities
� Tracking community evolution reveals complex picture of
multiple viewpoints, which is important for decision
making
� Future work:
• A unified framework that considers membership consistency
and evolutionary relationship
• An approach for discovering emergent communities and
supporting community awareness
• Validating community analysis through user actions and
ethnography study
Conclusions
Thanks!
Questions?