Automated Discovery and Visualization of
Online Social Networks
Anatoliy Gruzd @dalprof [email protected] Associate Professor, School of Information Management Director, Social Media Lab Faculty of Management / Faculty of Computer Science Dalhousie University
Summer School “Social Network Analysis: Internet Research”, St.Pete, Rus, 2013
Dalhousie University
Faculty of Management
School of Information Management
Social Media Lab
Social Media Lab
SocialMediaLab.ca
http://SocialMediaAndSociety.com/
New Sage Journal: Big Data & Society
• Open Access & Multidisciplinary
• Editors • Evelyn Ruppert (Sociology, Goldsmiths, UK);
• Paolo Ciuccarelli (Density Design, Milan, IT)
• Anatoliy Gruzd (School of Information Management, Dalhousie
University, CA)
• Adrian Mackenzie (Sociology, Lancaster, UK)
• Richard Rogers (Digital Methods Initiative, Amsterdam, NL)
• Irina Shklovski (Digital Media & Communication Research Group, IT
University of Copenhagen, DK)
• Judith Simon (Institute for Technology Assessment and Systems
Analysis, Karlsruhe Institute of Technology, DE)
• Matt Zook (New Mappings Collaboratory, Geography, Kentucky, US).
Anatoliy Gruzd @dalprof
http://bit.ly/cfpInfluence
SocialMediaLab.ca Twitter: dalprof
Special Issue on Measuring Influence in Social Media
Editors: Anatoliy Gruzd, School of Information Management, Dalhousie University Barry Wellman, Department of Sociology, University of Toronto
Agenda
• Thursday, Aug 16, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 1): Online Forums
• Friday, Aug 17, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 2): Blogs and MicroBlogs
• Wednesday, Aug 21, 10:00-11:30
Sense of Community in Online Communities
• Thursday, Aug 22, 10:00-11:30
Practice with Netlytic.org
Anatoliy Gruzd @dalprof
Growth of Social Media and Social Networks Data
1B users
500M users
Social Media have become an integral part of our daily lives!
How to Make Sense of Social Media Data?
17 Anatoliy Gruzd Twitter: @dalprof
How to Make Sense of Social Media Data?
Social Network Analysis (SNA)
• Nodes = Group Members/People
• Edges /Ties (lines) =
relations / Connections
18 Anatoliy Gruzd Twitter: @dalprof
•Users
–More useful recommendation systems
• Amazon, Netflix
–A more secured/easy way to share private content with trusted individuals
• “Web of Trust” (Golbeck, 2008; Matsuo et.al., 2004)
–Improve users’ experience with information systems
• Keeping in touch with friends and colleagues (e.g., LinkedIn, Facebook)
• New browsing capabilities for news stories (Pouliquen et al, 2007; Tanev, 2007)
Why Do We Want To Discover Online Social Networks?
22 Anatoliy Gruzd @dalprof
•Companies
–Recruiting talents
•Different ties for different needs (Leung, 2003)
–Finding experts
•Expertise oriented searching using social networks (Ehrlich et al, 2007; Li et al,
2007)
–Marketing
•Viral marketing (Domingos, 2005)
•Building brand loyalty using customer networks (Thompson & Sinha, 2008)
Why Do We Want To Discover Online Social Networks?
23 Anatoliy Gruzd @dalprof
• Reduce the large quantity of data into
a more concise representation
• Makes it much easier to understand
what is going on in a group
Advantages of Using Social Network Analysis to
Analyze Social Media Data
Anatoliy Gruzd Twitter: @dalprof
•Researchers
– Ability to ask and answer deeper questions about the nature and operation of online communities
• How and why one online community emerges and another dies?
• How people agree on common practices and rules in an online community?
• How knowledge and information is shared among group members?
Why Do We Want To Discover Online Social Networks?
Anatoliy Gruzd @dalprof
•Common approach: surveys or interviews
•A sample question about students’ perceived social structures (based on C. Haythornthwaite’s 1999 LEEP study protocol)
How Do We Collect Information About Social Networks?
Please indicate on a scale from [1] to [5],
YOUR FRIENDSHIP RELATIONSHIP WITH EACH STUDENT IN THE CLASS
[1] - don’t know this person
[2] - just another member of class
[3] - a slight friendship
[4] - a friend
[5] - a close friend
Alice D. [1] [2] [3] [4] [5]
…
Richard S. [1] [2] [3] [4] [5]
28 Anatoliy Gruzd @dalprof
LimeSurvey (open source) – www.limesurvey.org
Anatoliy Gruzd @dalprof
VENNMAKER http://www.vennmaker.com/en
Anatoliy Gruzd @dalprof
Using VennMaker to Collect an Ego Network
Anatoliy Gruzd @dalprof
Problems with surveys or interviews
• Time-consuming
• Questions can be too sensitive
• Answers are subjective or incomplete
• Participant can forget people and interactions
• Different people perceive events and relationships differently
How Do We Collect Information About Online Social Networks?
34 Anatoliy Gruzd Twitter: @dalprof
Different Types of Online Social Networks
http://www.visualcomplexity.com/vc
•Email networks
•Forum networks
•Blog networks
•Friends’ networks on Facebook, Twitter, etc
•Networks of like-minded people on
How Do We Collect Information About Social Networks?
35 Anatoliy Gruzd @dalprof
Automated Discovery of Social Networks
Emails
Nick
Rick
Dick
• Nodes = People
• Ties = “Who talks to whom”
• Tie strength = The number of
messages exchanged between
individuals
36 Anatoliy Gruzd @dalprof
Automated Discovery of Social Networks
“Many to Many” Communication
Chat Mailing listserv Forum Comments
37 Anatoliy Gruzd @dalprof
Automated Discovery of Social Networks Approach 1: Chain Network (Reply-to)
FROM: Sam PREVIOUS POSTER: Gabriel
“ Nick, Gina and Gabriel: I apologize for not backing this up
with a good source, but I know from reading about this topic that … ”
Posting
header
Content
Possible Missing Connections:
• Sam -> Nick
• Sam -> Gina
• Nick <-> Gina 39 Anatoliy Gruzd
40
Chain Networks: missed info.
FROM: Eva REFERENCE CHAIN: Gabriel, Sam, Gina “ Gina, I owe you a cookie. This is exactly what I wanted to know. I was already planning on taking 402 next semester, and now I have something to look forward to! ”
FROM: Fred
“ I wonder if that could be why other libraries
around the world have resisted changing –
it's too much work, and as Dan pointed out, too expensive. ”
Ex.2
Ex.3
Research Question
What content-based features of online interactions can help to uncover nodes and ties between group members?
How Do We Collect Information About Social Networks?
41 Anatoliy Gruzd @dalprof
Automated Discovery of Social Networks
Approach 2: Name Network
FROM: Ann
“Steve and Natasha, I couldn't wait to see your site.
I knew it was going to [be] awesome!”
This approach looks for personal names in the content of the messages to identify social connections between group members.
42 Anatoliy Gruzd @dalprof
•Main Communicative Functions of Personal Names (Leech, 1999)
–getting attention and identifying addressee
–maintaining and reinforcing social relationships
•Names are “one of the few textual carriers of identity” in discussions on the web (Doherty, 2004)
•Their use is crucial for the creation and maintenance of a sense of community (Ubon, 2005)
Automated Discovery of Social Networks
Approach 2: Name Network
43 Anatoliy Gruzd @dalprof
Summary
1. Why Do We Want To Discover Online Social Networks?
2. How Do We Collect Information About Social Networks?
– Self-reported vs Observed , Manual vs. Automated
3. Automated Discovery of Social Networks from Online
Conversations
– Methods: Chain Networks vs Name Networks
4. Evaluation of Name Networks with Forum Data
Anatoliy Gruzd @dalprof
Automated Discovery of Social Networks
Name Network Method: Challenges
Kurt Cobain, a lead singer for the rock band Nirvana
chris is not a group member
Santa Monica Public Library
John Dewey, philosopher & educator
mark up language
Solution: - Name alias resolution
48 Anatoliy Gruzd @dalprof
Chain Network
(less connections)
Name Network
(more connections)
Evaluating Name Networks
Example: Youtube comments
Chain Network Name Network
50 Anatoliy Gruzd @dalprof
Evaluating Name Networks Results from Online Learners Dataset
Dataset
Classes 6
School year Spring 2008
Duration of each
class 15 weeks
No. of students
per class 17 – 29
Data source
• Bulletin board
messages
• Online
questionnaire
51
No. of all postings
0
500
1000
1500
2000
Class
#1
Class
#2
Class
#3
Class
#4
Class
#5
Class
#6
No. of students
0
10
20
30
Class #1 Class #2 Class #3 Class #4 Class #5 Class #6
Gruzd, A. (2009). Studying Collaborative Learning Using Name Networks.Journal of Education for Library and Information Science 50(4): 243-253.
Evaluating Name Networks
52
Name Network Chain Network
Forum Postings
Self-Reported Network
Survey Comparison Procedure:
• QAP correlations • Exponential random graph models • Manual exploration using network visualization
vs.
vs. vs.
Results
• Name networks provide on average 40% more information about social ties in a group as compared to Chain networks
53
“New” Info
82% An addressee has not
posted to the thread
18% An addressee is not the most
recent poster
70% Thread-starting posting
30% A subsequent posting
in the thread
Name Network Chain Network QAP correlation ~ 0.5
Results
• Self-reported networks are almost twice as likely to share the same ties
with Name networks than with Chain networks.
54
Chain Network Name Network
Self-Reported Network
Results
• The following social relations were found by the “name network” method
55
• These social relations are considered by many researchers to be crucial in shared knowledge construction and community building thus, name networks can be useful in the assessment of collaborative learning
Learn ● Collaborative Work ● Help
Learn
• ‘Learn’ relation is often discovered in postings that refer
to somebody else who has presented or posted
something awhile ago or via different communication
channel
– “… it made me think of [an example] that Karen posted ”
56
Collaborative Work
• Organizing group work, taking a leadership role
– “ Some quick poking around shows that Steve and myself are here in Champaign, [...] and Nicole is in Chicago. [...] does anyone have a strong desire to be our contact person to the administrators ”
• A reference to an event or interaction that happened outside the online forum
– “ Anne and I have been corresponding via e-mail and she reminded me that we should be having discussion here "
• A reference to the whole group or posting on behalf of the whole group
– “ Steve and Natasha, I couldn't wait to see your site. I knew it was going to [be] awesome! ”
57
Help
• ‘Help’ relation is often discovered in
– postings from students asking for the instructor’s help or
– postings that mention classmate’s name in the context of words like “thank you”, “help”, “assistance”.
58
Help
• Ask the instructor about something
– “ [Instructor’s name] if you see this posting would you please clarify for us ”
• Ask peers to clarify something that the instructor said during the lecture
– “ I remember [Instructor’s name] asking us to email her with topics [...] I wonder if that is in replacement of our bb question? ”
59
Agenda
• Thursday, Aug 16, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 1): Online Forums
• Friday, Aug 17, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 2): Blogs
• Wednesday, Aug 21, 10:00-11:30
Sense of Community in Online Communities
• Thursday, Aug 22, 10:00-11:30
Practice with Netlytic.org
Anatoliy Gruzd @dalprof
Options for Automated Discovery of Communication Networks among Blogs
Furukawa, et.al. (2007),
Ali-Hasan & Adamic (2007):
– Blogroll links
– Citation links
– Comment links
– Trackback
65
Options for Automated Discovery of Communication Networks among Blogs
Furukawa, et.al. (2007),
Ali-Hasan & Adamic (2007):
– Blogroll links
– Citation links
– Comment links
– Trackback
66
Options for Automated Discovery of Communication Networks among Blogs
Furukawa, et.al. (2007),
Ali-Hasan & Adamic (2007):
– Blogroll links
– Citation links
– Comment links
– Trackback
67
Options for Automated Discovery of Communication Networks among Blogs
Furukawa, et.al. (2007),
Ali-Hasan & Adamic (2007):
– Blogroll links
– Citation links
– Comment links
– Trackback
68
Discovering Networks in Blogosphere Example: Political Blogs
http://presidentialwatch08.com
Gruzd, A., Black, F.A., Le, Y., Amos, K. (2012). Investigating Biomedical Research Literature in the Blogosphere: A Case Study of Diabetes and HbA1c. Journal of the Medical Library Association 100(1): 34-42. DOI: 10.3163/1536-5050.100.1.007
Case Study:
Online Communities Among Blog Readers
•Can a blog support the development of an online community?
•How do we know if a community has emerged among blog readers?
Gruzd, A. (2009). Automated Discovery of Emerging Online Communities Among Blog Readers: A Case Study of a Canadian Real Estate Blog. Proceedings of the Internet Research 10.0 Conference, October 7-11, 2009, Milwaukee, WI, USA.
Characteristics of Online Community
Virtual Settlement (Jones, 1997)
–virtual common-public-place
–interactivity
–sustained membership
Sense of Community (McMillan & Chavis, 1986)
–feelings of membership & influence
–reinforcement of needs
–shared emotional connection
Anatoliy Gruzd @dalprof
Comments Posted by Blog Readers
Anatoliy Gruzd @dalprof
Anatoliy Gruzd @dalprof
Changes in Social Networks over Time
Anatoliy Gruzd @dalprof
SNA Statistics
–the posters became more connected and more of them took a stand in a group
Anatoliy Gruzd @dalprof
Agenda
• Thursday, Aug 16, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 1): Online Forums
• Friday, Aug 17, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 2): Blogs
• Wednesday, Aug 21, 10:00-11:30
Sense of Community in Online Communities
• Thursday, Aug 22, 10:00-11:30
Practice with Netlytic.org
Anatoliy Gruzd @dalprof
Anatoliy Gruzd Twitter: @dalprof #1b1t Twitter Book Club
Anatoliy Gruzd Twitter: @dalprof 2012 Olympics in London
Anatoliy Gruzd Twitter: @dalprof #tarsand Twitter Community
A Case Study of @barrywellman’s
“Imagined Community” on Twitter
Gruzd, A., Wellman, B., and Takhteyev, Y. (2011). Imagining Twitter as an Imagined Community. American Behavioral Scientist 55 (10): 1294-1318, DOI: 10.1177/0002764211409378.
Characteristics of Online Community
• Anderson’s imagined communities (1983)
– Common language
– Temporality
– High Centers
• Jones` Virtual Settlement (1997)
– virtual common-public-place
– interactivity
– sustained membership
• McMillan & Chavis` Sense of Community (1986)
– feelings of membership & influence
– reinforcement of needs
– shared emotional connection
– Wellman’s “networked individualism”
Dataset
1) August 2009
- 56 mutuals, 140 connections
2) February 2010
- 72 mutuals, 285 connections
3) April 2009 - February 2010
- 3,112 tweets
Twitterspeak: Specialized Language &
Norms
For all Twitter users:
URL shorteners to save space: bit.ly
Hashtags (#): #Sunbelt
RT (ReTweet)
@name (@dalprof)
-------------------------------
For Barry`s network:
“Wellman:” or “Me:”
“(X of Y)” or “(X/Y)”
“High Centers”
• An “imagined” community on Twitter is dual-faceted -
collective and personal
• The collective Twitter community forms around high
centers who are popular individuals (danah boyd),
celebrities (Britney), or organizations such as media
companies (BBC)
• The high centers in the personal Twitter don't have
to be “celebrities”, but
• The local and overall high centers overlap to some
extent
Interactivity
• 60% of Barry's tweets included the @ sign
• The average Twitter user – 24.5% (Huberman
et al, 2009), 12.5% (Java et al., 2006)
• “Name network” (mentions and co-occurrence of
usernames)
• 3,112 tweets-> 512 users(1,448 ties)
vs. 56 mutuals(101 ties)
• Result: The mutual network of 56 users
is 6 times denser than the larger interaction
network of 512 users
•QAP correlation of 0.27 (p<0.05) between the mutual and interaction networks.
“Manual” Clusters in Barry’s Twitter Network
Summary (1)
Barry`s network exhibits characteristics of
• Anderson’s “imagined communities”
• Jones’ “virtual settlement”
• McMillan and Chavis’ “sense of community”
• Wellman’s “networked individualism”
• Barry`s network is both “real” and “imagined”
– real because the participants interact, especially the
mutuals
– imagined because they have some sense of
community
Summary (2)
• Why Barry’s online community has grown
while maintaining a sense of community?
– Core members who actively interact with each other
& participate in the community for a long time
– Barry's community is open to newcomers
•Twitter’s asymmetric connections
•Trust, professionalism and informality
among the active mutuals
• Combo of strong & weak ties
connectivity between social circles
Case Study: #hcsmca Twitter Community
hcsmca: Health Care Social Media Canada
Haythornthwaite,C. and Gruzd, A. (forthcoming). Enabling Community through Social Media. Journal of Medical Internet
Research
Background
• #hcsmca is a vibrant community of people interested in exploring social innovation in health care. We share and learn, and together we are making health care more open and connected
• #hcsmca hosts a tweet chat every Wednesday at 1 pm ET. The last Wednesday of the month is our monthly evening chat at 9 pm ET.
Source: http://cyhealthcommunications.wordpress.com/hcsmca-2/
Research questions
1. What accounts for the relative longevity of this particular online community?
– Is it because of the founder’s leadership and her continuing involvement in this community?
– Or is there a core group of members who are also actively and persistently involved in this community?
2. What is the composition of this community? Does one’s professional role/title determine a person’s centrality within this community.
Step 1: Data Collection Data: Public Twitter messages that mentioned the #hcsmca hashtag/keyword Collection Period: November 12 – December 13, 2012 Software: Netlytic http://netlytic.org
Topics Covered (1)
Nov 14, 2012 T1: Challenge of engaging SM to inform a research agenda
T2: Use of innovation, SM, and gamification to encourage
uptake of self-care
Topics Covered (2) Nov 21, 2012 T1 Healthcare blogs should we or shouldn’t we, what have
we learned, what are the benefits?
T2 Are healthcare blogs a useful tool for education and
knowledge transfer?
Topics Covered (3)
Nov 28, 2012 T1: How has social media made you healthier? Unhealthier?
Has social media made our health choices more numerous
and this overwhelming?
T2: What messaging would motivate you to make a positive
health change? Who would you listen to?
Step 2: Discover #hcsmca Communication Network
“Name Networks” Tie = Who mentions or replies to whom
Automated Discovery of Online Social Networks
Example: Tweets
@John
@Peter
@Paul
• Nodes = People
• Ties = “Who retweeted/
replied/mentioned whom”
• Tie strength = The number of
retweets, replies or mentions
102 Anatoliy Gruzd Twitter: @dalprof
Step 3: Social Network Analysis using ORA and Ucinet
#hcsmca Communication Network on Twitter (Nov 12 - Dec 13)
Net viz in Netlytic: http://netlytic.org/gephi/sigma.php?c=0ZnbSm6D23u07bT0&viz=2
#hcsmca Communication Network on Twitter (Nov 12 - Dec 13)
*Roles are assigned manually
Roles Count
SM health content
providers 110
Unaffiliated individual users 89
Communicators - not
specifically health related 74
Communicators - Health
related 59
Healthcare professionals 50
Health institutions 31
Advocacy 30
Students 16
Educators, professors 13
Researchers 10
Government and health
policy makers 4
Node size = In-Degree Centrality
#hcsmca Communication Network on Twitter
Nodes are automatically grouped based on their roles No apparent clustering among people in the same role (notice cross-group ties)
Procedure: Analysis of Variance Density Test using UCINET
Agenda
• Thursday, Aug 16, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 1): Online Forums
• Friday, Aug 17, 10:00-11:30
Automated Discovery of Social Networks from Online Data
(Part 2): Blogs
• Wednesday, Aug 21, 10:00-11:30
Sense of Community in Online Communities
• Thursday, Aug 22, 10:00-11:30
Practice with Netlytic.org
Anatoliy Gruzd @dalprof
Netlytic a cloud-based analytic tool for automated text analysis &
discovery of social networks from online communication
Ne
two
rk
s
Sta
ts
Co
nte
nt
109 Anatoliy Gruzd Twitter: @dalprof
http://netlytic.org
1) Capture public, online, conversational-type data such as tweets, blog
comments, forum postings, and text messages, etc.
2) Find and explore emerging themes of discussions among individuals
within your data set,
3) Build and visualize communication networks to discover and explore
emerging social connections between individuals.
General Stats about Your Dataset
Interactive Tag Cloud of Top Words From
Your Dataset Example: 2012 Halifax Municipal Election Dataset
112 Anatoliy Gruzd Twitter: @dalprof
Stacked Chart of Top Words Over Time Examples: “@Tomaskformore, @teamsavagehrm, Online, etc…
113 Anatoliy Gruzd Twitter: @dalprof
All Mentions of a Particular Top Word or Concept is 1 Click Away
114 Anatoliy Gruzd Twitter: @dalprof
TreeMap of User-defined Cognitive & Social Categories Examples: “Fear, promotion, self, uncertainty, disagreement, etc…
115 Anatoliy Gruzd Twitter: @dalprof
User-defined Cognitive & Social Categories for “Disagreement” Example : “No, wrong, however, agree…but, etc…”
116 Anatoliy Gruzd Twitter: @dalprof
All Mentions of “NO” Within the User-Defined
Cognitive & Social Categories for “Disagreement”
117 Anatoliy Gruzd Twitter: @dalprof
Visualization of Communication Networks Netlytic can automatically build 2 types of network from your dataset
1. Chain Network (Who Replies-to-Whom)
2. Name Network (Who Mentions-Whom within a Message)
118 Anatoliy Gruzd Twitter: @dalprof
All Connections Between Any 2 Nodes is 1 Click Away
119 Anatoliy Gruzd Twitter: @dalprof
Anatoliy Gruzd Twitter: @dalprof
#1b1t Twitter Book Club § Gruzd, A. and Sedo, D.R. (2012) #1b1t: Investigating Reading Practices at the Turn of the Twenty-first Century.Journal of Studies in Book Culture 3(2). Available at http://id.erudit.org/iderudit/1009347ar
Sample Dataset
Automated Discovery and Visualization of
Online Social Networks
Anatoliy Gruzd @dalprof [email protected] Associate Professor, School of Information Management Director, Social Media Lab Faculty of Management / Faculty of Computer Science Dalhousie University
Summer School “Social Network Analysis: Internet Research”, St.Pete, Rus, 2013