Date post: | 15-Jan-2017 |
Category: |
Social Media |
Upload: | anatoliy-gruzd |
View: | 1,304 times |
Download: | 0 times |
Social Media Data Collection & Network Analysis with Netlytic and R
Anatoliy [email protected]@gruzd
Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of ManagementDirector, Social Media LabRyerson University
HKBU, Hong Kong
Dec 3, 2015
Twitter: @gruzd ANATOLIY GRUZD 1
Presentation Slides
http://bit.ly/hk15slides
Twitter: @gruzd ANATOLIY GRUZD 3
Twitter: @gruzd
ANATOLIY GRUZD
Social Media sites have become
an integral part of our daily lives!
Growth of Social Media Data
1.5B users
400M users
300M users
Decision Making
in domains such as Politics, Health Care and Education
Twitter: @gruzd ANATOLIY GRUZD 6
How to Make Sense of Social Media Data?
Self-collected/reported
Public APIs
Data Resellers
How to Make Sense of Social Media Data?Big Data Technology
Twitter: @gruzd ANATOLIY GRUZD 7
Credit: Nathan Lapierre
Twitter: @gruzd ANATOLIY GRUZD 8
Social Media Analytics Toolshttp://socialmedialab.ca/apps/social-media-toolkit/
Data -> Visualizations -> Understanding
How to Make Sense of Social Media Data?
Twitter: @gruzd ANATOLIY GRUZD 9
How to Make Sense of Social Media Data?Example: Geo-based Analysis
Twitter: @gruzd ANATOLIY GRUZD 10
How to Make Sense of Social Media Data?Example: Geo-based Analysis
Twitter: @gruzd ANATOLIY GRUZD 11
Geography of
Twitter Networks
How to Make Sense of Social Media Data?Example: Geo-based + Content AnalysisTracking Hate Speech on Twitter
Twitter: @gruzd ANATOLIY GRUZD 12
Source: http://www.fenuxe.com/tag/geo-coded
Social Network Analysis (SNA)
• Nodes = People
• Edges /Ties (lines) = Relations/
“Who retweeted/ replied/
mentioned whom”
How to Make Sense of Social Media Data?
Twitter: @gruzd ANATOLIY GRUZD 13
Makes it much easier to understand what is going on
in a group
Advantages of
Social Network Analysis
Once the network is discovered, we can find
out:
• How do people interact with each other,
• Who are the most/least active members,
• Who is influential in a group,
• Who is susceptible to being influenced,
etc…
Twitter: @gruzdANATOLIY GRUZD
14
Liberal
ConservativeSpam
Unknown &
Undecided
NDP
Left
Green
Bloc
Other
Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do
Birds of a Feather Flock Together on Twitter? Policy & Internet.
Common approach for collecting social network data:
• Self-reported social network data may not be available/accurate
• Surveys or interviews
Problems with surveys or interviews
• Time-consuming
• Questions can be too sensitive
• Answers are subjective or incomplete
• Participant can forget people and
interactions
• Different people perceive events and
relationships differently
How Do We Collect Information About Online Social Networks?
Twitter: @gruzd ANATOLIY GRUZD 15
Studying Online Social Networks
http://www.visualcomplexity.com/vc
Forum networks
Blog networks
Friends’ networks (Facebook,
Twitter, Google+, etc…)
Networks of like-minded people
(YouTube, Flickr, etc…)
Twitter: @gruzd ANATOLIY GRUZD 17
Goal: Automated Networks Discovery
Challenge: Figuring out what content-based features of online interactions can help to uncover nodes and ties between group members
How Do We Collect Information About Online Social Networks?
Twitter: @gruzd ANATOLIY GRUZD 18
Automated Discovery of Social Networks
Emails
Nick
Rick
Dick
• Nodes = People
• Ties = “Who talks to whom”
• Tie strength = The number of
messages exchanged between
individuals
Twitter: @gruzd ANATOLIY GRUZD 19
Automated Discovery of Social Networks
“Many to Many” Communication
ChatMailing listservForum Comments
Twitter: @gruzd ANATOLIY GRUZD 20
@John
@Peter
@Paul • Nodes = People
• Ties = “Who retweeted/
replied/mentioned whom”
• Tie strength = The number of
retweets, replies or mentions
Automated Discovery of Social NetworksTwitter Networks
Twitter: @gruzd ANATOLIY GRUZD 21
Automated Discovery of Social Networks
Twitter Data Examples
Network Ties
@Cheeflo -> @JoeProf@Cheeflo -> @VMosco@JoeProf -> @VMosco
Twitter: @gruzd ANATOLIY GRUZD 22
Network Tie
@Gruzd -> @SidneyEve
Connection type: Mention
Connection type: Reply
Sample Twitter Searches
#ELECTION2016 #HONGKONG
Twitter: @gruzd ANATOLIY GRUZD 23
3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
Sample Twitter Searches
#ELECTION2016 #HONGKONG
Twitter: @gruzd ANATOLIY GRUZD 24
3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
Sample Twitter Searches
#ELECTION2016 #HONGKONG
Twitter: @gruzd ANATOLIY GRUZD 25
3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
What do these visualizations tell us?
SNA MeasuresMicro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g., closeness, eigenvector)
Macro-level
Density
Diameter
Reciprocity
Centralization
Modularity
ANATOLIY GRUZD 26Twitter: @gruzd
SNA MeasuresMicro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g., closeness, eigenvector)
ANATOLIY GRUZD 27
In-degree suggests “prestige” highlighting the most mentioned or replied Twitter users
Twitter: @gruzd
In-degree centrality#HongKong Twitter network
Twitter: @gruzd ANATOLIY GRUZD 28
SEVENTEEN or SVT is
a S.Korean boy group formed
by Pledis Entertainment
SNA MeasuresMicro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g., closeness, eigenvector)
ANATOLIY GRUZD 29
Out-degree reveals active Twitter users with a good awareness of others in the network
Twitter: @gruzd
Out-degree centrality#HongKong Twitter network
Twitter: @gruzd ANATOLIY GRUZD 30
Note: A music fan (many
retweets & replies to others)
SNA MeasuresMicro-level
In-degree centrality
Out-degree centrality
Betweenness centrality
Other centrality measures (e.g., closeness, eigenvector)
ANATOLIY GRUZD 31
Betweenness shows actors who are located on the most number of information paths and who often connect different groups of users in the network
Twitter: @gruzd
Betweenness centrality#HongKong Twitter network
Twitter: @gruzd ANATOLIY GRUZD 32
Note: A fan (retweets/replies to messages
from two different fan communities/sites)
Sample Twitter Searches
#ELECTION2016 #HONGKONG
Twitter: @gruzd ANATOLIY GRUZD 33
3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
SNA MeasuresMacro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Density indicates the overall connectivity in the network (the total number of connections divided by the total number of possible connections).
It is equal to 1 when everyone is connected to everyone.
ANATOLIY GRUZD 34Twitter: @gruzd
User1 User3
User2Density = 1
#Election2016 #HongKong
Nodes 491 2570
Edges 1075 2447
Density 0.005 (0.5%) 0.0004 (0.04%)
Diameter
Reciprocity
Centralization
Modularity
ANATOLIY GRUZD 35Twitter: @gruzd
SNA MeasuresMacro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Diameter gives a general idea of how “wide” the network is; the longest of the shortest paths between any two nodes in the network.
ANATOLIY GRUZD 36Twitter: @gruzd
#1
User1User3
User2
User4
Diameter = 3
#2
#3
#Election2016 #HongKong
Nodes 491 2570
Edges 1075 2447
Density 0.005 (0.5%) 0.0004 (0.04%)
Diameter 28 14
Reciprocity
Centralization
Modularity
ANATOLIY GRUZD 37Twitter: @gruzd
SNA MeasuresMacro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Reciprocity shows how many online participants are having two-way conversations.
In a scenario when everyone replies to everyone, the reciprocity value will be 1.
ANATOLIY GRUZD 38Twitter: @gruzd
User2
User1User3
User4 Reciprocity=1
#Election2016 #HongKong
Nodes 491 2570
Edges 1075 2447
Density 0.005 (0.5%) 0.0004 (0.04%)
Diameter 28 14
Reciprocity 0.006 (0.6%) 0.003 (0.3%)
Centralization
Modularity
ANATOLIY GRUZD 39Twitter: @gruzd
SNA MeasuresMacro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Centralization indicates whether a network is dominated by few central participants (values are closer to 1),
or whether more people are contributing to discussion and information dissemination (values are closer to 0).
ANATOLIY GRUZD 40Twitter: @gruzd
User2
User1User3
User4 Centralization=1
#Election2016 #HongKong
Nodes 491 2570
Edges 1075 2447
Density 0.005 (0.5%) 0.0004 (0.04%)
Diameter 28 14
Reciprocity 0.006 (0.6%) 0.003 (0.3%)
Centralization 0.05 0.11
Modularity
ANATOLIY GRUZD 42Twitter: @gruzd
SNA MeasuresMacro-level
Density
Diameter
Reciprocity
Centralization
Modularity
Modularity provides an estimate of whether a network consists of one coherent group of participants who are engaged in the same conversation and who are paying attention to each other (values closer to 0);
or whether a network consists of different conversations and communities with a weak overlap (values closer to 1).
ANATOLIY GRUZD 44Twitter: @gruzd
#Election2016 #HongKong
Nodes 491 2570
Edges 1075 2447
Density 0.005 (0.5%) 0.0004 (0.04%)
Diameter 28 14
Reciprocity 0.006 (0.6%) 0.003 (0.3%)
Centralization 0.05 0.11
Modularity 0.42 0.92
ANATOLIY GRUZD 47Twitter: @gruzd
Practice with Netlytic + R
Twitter: @gruzd Anatoliy Gruzd 48
Twitter hashtag:
#HongKong
Instructions at
http://bit.ly/hknet15