+ All Categories
Home > Social Media > Social Media Data Collection & Network Analysis with Netlytic and R

Social Media Data Collection & Network Analysis with Netlytic and R

Date post: 15-Jan-2017
Category:
Upload: anatoliy-gruzd
View: 1,304 times
Download: 0 times
Share this document with a friend
42
Social Media Data Collection & Network Analysis with Netlytic and R Anatoliy Gruzd [email protected] @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University HKBU, Hong Kong Dec 3, 2015 Twitter: @gruzd ANATOLIY GRUZD 1
Transcript

Social Media Data Collection & Network Analysis with Netlytic and R

Anatoliy [email protected]@gruzd

Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of ManagementDirector, Social Media LabRyerson University

HKBU, Hong Kong

Dec 3, 2015

Twitter: @gruzd ANATOLIY GRUZD 1

Research at the Social Media Lab

Presentation Slides

http://bit.ly/hk15slides

Twitter: @gruzd ANATOLIY GRUZD 3

Twitter: @gruzd

ANATOLIY GRUZD

Social Media sites have become

an integral part of our daily lives!

Growth of Social Media Data

Facebook

1.5B users

Instagram

400M users

Twitter

300M users

Decision Making

in domains such as Politics, Health Care and Education

Twitter: @gruzd ANATOLIY GRUZD 6

How to Make Sense of Social Media Data?

Self-collected/reported

Public APIs

Data Resellers

How to Make Sense of Social Media Data?Big Data Technology

Twitter: @gruzd ANATOLIY GRUZD 7

Credit: Nathan Lapierre

Twitter: @gruzd ANATOLIY GRUZD 8

Social Media Analytics Toolshttp://socialmedialab.ca/apps/social-media-toolkit/

Data -> Visualizations -> Understanding

How to Make Sense of Social Media Data?

Twitter: @gruzd ANATOLIY GRUZD 9

How to Make Sense of Social Media Data?Example: Geo-based Analysis

Twitter: @gruzd ANATOLIY GRUZD 10

How to Make Sense of Social Media Data?Example: Geo-based Analysis

Twitter: @gruzd ANATOLIY GRUZD 11

Geography of

Twitter Networks

How to Make Sense of Social Media Data?Example: Geo-based + Content AnalysisTracking Hate Speech on Twitter

Twitter: @gruzd ANATOLIY GRUZD 12

Source: http://www.fenuxe.com/tag/geo-coded

Social Network Analysis (SNA)

• Nodes = People

• Edges /Ties (lines) = Relations/

“Who retweeted/ replied/

mentioned whom”

How to Make Sense of Social Media Data?

Twitter: @gruzd ANATOLIY GRUZD 13

Makes it much easier to understand what is going on

in a group

Advantages of

Social Network Analysis

Once the network is discovered, we can find

out:

• How do people interact with each other,

• Who are the most/least active members,

• Who is influential in a group,

• Who is susceptible to being influenced,

etc…

Twitter: @gruzdANATOLIY GRUZD

14

Liberal

ConservativeSpam

Unknown &

Undecided

NDP

Left

Green

Bloc

Other

Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do

Birds of a Feather Flock Together on Twitter? Policy & Internet.

Common approach for collecting social network data:

• Self-reported social network data may not be available/accurate

• Surveys or interviews

Problems with surveys or interviews

• Time-consuming

• Questions can be too sensitive

• Answers are subjective or incomplete

• Participant can forget people and

interactions

• Different people perceive events and

relationships differently

How Do We Collect Information About Online Social Networks?

Twitter: @gruzd ANATOLIY GRUZD 15

Studying Online Social Networks

http://www.visualcomplexity.com/vc

Forum networks

Blog networks

Friends’ networks (Facebook,

Twitter, Google+, etc…)

Networks of like-minded people

(YouTube, Flickr, etc…)

Twitter: @gruzd ANATOLIY GRUZD 17

Goal: Automated Networks Discovery

Challenge: Figuring out what content-based features of online interactions can help to uncover nodes and ties between group members

How Do We Collect Information About Online Social Networks?

Twitter: @gruzd ANATOLIY GRUZD 18

Automated Discovery of Social Networks

Emails

Nick

Rick

Dick

• Nodes = People

• Ties = “Who talks to whom”

• Tie strength = The number of

messages exchanged between

individuals

Twitter: @gruzd ANATOLIY GRUZD 19

Automated Discovery of Social Networks

“Many to Many” Communication

ChatMailing listservForum Comments

Twitter: @gruzd ANATOLIY GRUZD 20

@John

@Peter

@Paul • Nodes = People

• Ties = “Who retweeted/

replied/mentioned whom”

• Tie strength = The number of

retweets, replies or mentions

Automated Discovery of Social NetworksTwitter Networks

Twitter: @gruzd ANATOLIY GRUZD 21

Automated Discovery of Social Networks

Twitter Data Examples

Network Ties

@Cheeflo -> @JoeProf@Cheeflo -> @VMosco@JoeProf -> @VMosco

Twitter: @gruzd ANATOLIY GRUZD 22

Network Tie

@Gruzd -> @SidneyEve

Connection type: Mention

Connection type: Reply

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 23

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 24

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 25

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

What do these visualizations tell us?

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

Macro-level

Density

Diameter

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 26Twitter: @gruzd

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 27

In-degree suggests “prestige” highlighting the most mentioned or replied Twitter users

Twitter: @gruzd

In-degree centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 28

SEVENTEEN or SVT is

a S.Korean boy group formed

by Pledis Entertainment

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 29

Out-degree reveals active Twitter users with a good awareness of others in the network

Twitter: @gruzd

Out-degree centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 30

Note: A music fan (many

retweets & replies to others)

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 31

Betweenness shows actors who are located on the most number of information paths and who often connect different groups of users in the network

Twitter: @gruzd

Betweenness centrality#HongKong Twitter network

Twitter: @gruzd ANATOLIY GRUZD 32

Note: A fan (retweets/replies to messages

from two different fan communities/sites)

Sample Twitter Searches

#ELECTION2016 #HONGKONG

Twitter: @gruzd ANATOLIY GRUZD 33

3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Density indicates the overall connectivity in the network (the total number of connections divided by the total number of possible connections).

It is equal to 1 when everyone is connected to everyone.

ANATOLIY GRUZD 34Twitter: @gruzd

User1 User3

User2Density = 1

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 35Twitter: @gruzd

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Diameter gives a general idea of how “wide” the network is; the longest of the shortest paths between any two nodes in the network.

ANATOLIY GRUZD 36Twitter: @gruzd

#1

User1User3

User2

User4

Diameter = 3

#2

#3

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 37Twitter: @gruzd

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Reciprocity shows how many online participants are having two-way conversations.

In a scenario when everyone replies to everyone, the reciprocity value will be 1.

ANATOLIY GRUZD 38Twitter: @gruzd

User2

User1User3

User4 Reciprocity=1

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization

Modularity

ANATOLIY GRUZD 39Twitter: @gruzd

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Centralization indicates whether a network is dominated by few central participants (values are closer to 1),

or whether more people are contributing to discussion and information dissemination (values are closer to 0).

ANATOLIY GRUZD 40Twitter: @gruzd

User2

User1User3

User4 Centralization=1

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization 0.05 0.11

Modularity

ANATOLIY GRUZD 42Twitter: @gruzd

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Modularity provides an estimate of whether a network consists of one coherent group of participants who are engaged in the same conversation and who are paying attention to each other (values closer to 0);

or whether a network consists of different conversations and communities with a weak overlap (values closer to 1).

ANATOLIY GRUZD 44Twitter: @gruzd

#Election2016 #HongKong

Nodes 491 2570

Edges 1075 2447

Density 0.005 (0.5%) 0.0004 (0.04%)

Diameter 28 14

Reciprocity 0.006 (0.6%) 0.003 (0.3%)

Centralization 0.05 0.11

Modularity 0.42 0.92

ANATOLIY GRUZD 47Twitter: @gruzd

Practice with Netlytic + R

Twitter: @gruzd Anatoliy Gruzd 48

Twitter hashtag:

#HongKong

Instructions at

http://bit.ly/hknet15


Recommended