+ All Categories
Home > Data & Analytics > Exploring Social Media with NodeXL

Exploring Social Media with NodeXL

Date post: 15-Jul-2015
Category:
Upload: shalin-hai-jew
View: 212 times
Download: 1 times
Share this document with a friend
73
Exploring Social Media with NodeXL (Updated April 15, 2016)
Transcript
Page 1: Exploring Social Media with NodeXL

Exploring Social Media with NodeXL

(Updated April 15, 2016)

Page 2: Exploring Social Media with NodeXL

One Goal Today

• What are the capabilities of NodeXL, and do I have a use for it (for research, for exploration, for fun, or some mix of the prior)?

2

Page 3: Exploring Social Media with NodeXL

Overview

1. Network graphs and related terminology

2. Potential uses in research

3. NodeXL (Network Overview, Discovery, and Exploration for Excel)

4. Social media platforms

5. Data extraction runs

6. Data processing

3

Page 4: Exploring Social Media with NodeXL

Overview (cont.)

7. Network graph data visualizations8. NodeXL Graph Gallery and the virtual community 9. Beyond NodeXL10.Presentation review11.Some general takeaways about network analysis using social media data12.Questions? Comments?

4

Page 5: Exploring Social Media with NodeXL

1. Network Graphs and Related Terminology

A Very Brief Overview

5

Page 6: Exploring Social Media with NodeXL

6

Page 7: Exploring Social Media with NodeXL

Underlying Data in Matrices

7

Page 8: Exploring Social Media with NodeXL

Underlying Data in Matrices (cont.)

• May show whether a relationship exists or not (binary)

• May show strength (or intensity) of a relationship

• May show the direction of a relationship (one-way, two-ways/reciprocated)

• …and other information

8

Page 9: Exploring Social Media with NodeXL

Unit of Analysis

Global Network• Global network measures: Indicators of the types

of interrelated communities being observed • Inferences about the state of the community

• Inferences about how power moves, how information moves

• Inferences about who is influential and how

• Predictive analytics about where this community is going

• Overlapping networks

About Online Global Networks

• Central masses continue for a time

• Small clusters either meld with larger ones, or they eventually disappear

• Often held together for a time through charismatic leaders

• Isolates and pendants usually disappear over time

• Dynamism is a part of all networks

9

Page 10: Exploring Social Media with NodeXL

Unit of Analysis (cont.)

Nodes • Node-level measures: Indicators of the egos

• Inferences about the ego even if it is “invisible” based on its effect on the surrounding egos and entities

About Online Nodes • Cyber selves somewhat representational of the

real-world selves

• Messaging / location / imagery / profiles may be analyzed to infer personality and interests

• Popularity falling under a power law (a few stars garnering most of the attention, the rest in the long tail of social aspirants and poseurs)

10

Page 11: Exploring Social Media with NodeXL

Statistical Measures

Global Network Measures Betweenness centrality: Total number of shortest paths or

walks for each pair of dyadic nodes (info moves between the shortest paths and closest ties), how much of a bridge a node is for network connectivity In an undirected graph, distance to all other nodes

In directed graph, distances to a node more meaningful because node has little control over in-coming nodes <-

Closeness centrality: Geodesic path distance between a node and every other node (farness as sum of all distances to all other nodes; closeness as inverse of farness)

Node-level (Local) MeasuresDegree centrality: In-degree and out-degree

(relative popularity within the network)

Clustering coefficient: Embeddedness of single nodes in cliques or ego neighborhoods with its alters

11

Page 12: Exploring Social Media with NodeXL

Statistical Measures (cont.)

Global Network Measures Eigenvector centrality (diversity): Relative distances

between a node and every other node and those connected to higher-value or popular nodes resulting in a higher value (values between 0 and 1) as a measure of relative influence in a graph

Clustering coefficient: Aggregation of multiple nodes based on similarity (like co-occurrence) or connectivity, and expressed as proximity or closeness visually; may be a measure of transitivity

Motif Measures Dyads, triads, and other structured sub-groupings

Local and experiential for the nodes in terms of structured connections

May (fractals) / may not be reflective of the overall structure

Global motif censuses (counts of occurrences of various types of motif structures in a whole network)

Structural holes as indicators of potential openings for nodes and links (to build resilience)

12

Page 13: Exploring Social Media with NodeXL

Different Network Graphs

Social Network Graphs• Entities and interrelationships

• Follower – following (formally declared relationships) • Parasocial relationships online

• Weak ties, fragile linkage

• Reply message / retweet / reply video / likes / comment-to and others (situationally created relationships, empirical)

• Entities and contents

• Entities and events

Content Network Graphs• Based on content similarity

• Based on content proximity to other terms (based on different-sized “windows” moved across a text)

• Co-occurring terms (content) or tags (metadata)

• Scraped thumbnail images

• May be based on pre-structured content “thesauruses” or may be extracted and structured in an emergent way from the text corpuses (or texts)

13

Page 14: Exploring Social Media with NodeXL

Some Related Terminology

• Structure-mining: The study of networks and interrelationships in order to make inferences about systems (structure as a “topology” or a “map”)

• Graph: A data visualization of interrelationships (in either 2D or 3D), including node-link diagrams (usually without set x- y- axes but spatially relational via Euclidean distance) • Undirected graph: A graph in which the relationship between nodes is associational,

without arrows at the ends

• Directed graph (digraph): A graph in which the relationship between nodes is directional, with the potential for arrows at the ends

14

Page 15: Exploring Social Media with NodeXL

Some Related Terminology (cont.)

• Sociogram / sociograph: A network graph showing interrelationships between social entities

• Degree: The proximity of relationship (such as a 1, 1.5, or 2 degree relationship), in terms of directness of ties

• In-degree: The numbers of relationships in-coming to a vertex or node

• Out-degree: The numbers of out-relationships from a vertex or node

15

Page 16: Exploring Social Media with NodeXL

Some Related Terminology (cont.)

• Motifs: Various types of structures of node-based relationships between nodes in dyadic, triadic, and other polyatic relationships

• Clusters: Densely connected groups (subgraphs) in a network, including islands

• Isolates: Nodes in a network that are not directly connected to any other node

• Pendants or whiskers: A node connected to a network by only one relationship (link, edge)

16

Page 17: Exploring Social Media with NodeXL

Some Related Terminology (cont.)

• Bridging nodes: A node which is on the periphery of multiple social networks and connects them in a way that would not exist otherwise (and so is influential even if it is peripheral in the respective networks)

• Core-periphery dynamic: A concept of power and influence with those closest in to the core considered as most influential and those on the periphery as less so

• Graph diameter: The distance between the two farthest nodes in a network (in terms of shortest-distance hops between intermediate nodes)

17

Page 18: Exploring Social Media with NodeXL

Affordances of Electronic Social Network Analysis (E-SNA)

• Plenty of theory and current research • Social Networks (Journal, Elsevier)

• Structure-mining (relational topologies) and content-mining (text analysis, cultural analysis)

• Micro-, meso-, and macro- levels of analysis (zooming in and zooming out (for different levels of granularity): nodes / entities and links / relationships; motifs, clusters, and branches; entire networks

• Part of “network science”

18

Page 19: Exploring Social Media with NodeXL

2. Potential Uses in Research

19

Page 20: Exploring Social Media with NodeXL

General Research Possibilities

• Social media account profiling (through inferential analysis, data leakage, de-aliasing to personally identifiable information or “PII”)

• Trending online conversations (by #hashtag, by keyword); human sensor networks

• Identification of the “mayors of the hashtag” (per Dr. Marc A. Smith of SMRF)

• Public mindset on a topic (by both direct and indirect analysis) (by related tags networks from “free-form” folk tagging / folksonomies)

20

Page 21: Exploring Social Media with NodeXL

Some General Research Possibilities (cont.)

• Eventgraphing, event detection and monitoring, and event postmortems

• Reverse-engineering a social-mediated (political, marketing, fund-raising, or other) campaign; semi-live-tracking a social-mediated campaign

• Discovery of artificial accounts (including AI social bots); some application to potential fraud analysis

• The “company you keep” concept

21

Page 22: Exploring Social Media with NodeXL

Some General Research Possibilities (cont.)

• Geolocational applications: location -> messaging; messaging -> location

• “Oppo” (opposition) research (such as for political campaigns) through open-source intelligence (OSINT)

• Messaging: broadscale themes and particulars

• Inter-relationships

22

Page 23: Exploring Social Media with NodeXL

General Research Sequence

23

Page 24: Exploring Social Media with NodeXL

A Simplified Research Sequence of Extracting and Analyzing Social Media Information with NodeXL

1. Research question / open exploration / mixed intent 2. Social media strategy: social media platform(s), seeding term(s), and data

extraction parameters3. Data extractions using NodeXL4. Data processing 5. Data visualizations6. Data analysis (in NodeXL)7. Data analysis (outside of NodeXL)

24

Page 25: Exploring Social Media with NodeXL

Data Limitations

• Limited data sets (with no knowledge of the “N of all,” at least not without insider access)

• “Recent” data only (usually reverse listed from present to the past)

• Rate-limited data extractions

• Time-dependent data (with hidden dependencies)

25

Page 26: Exploring Social Media with NodeXL

Data Limitations (cont.)

• Reliance on (often noisy) textual descriptions of multimedia contents

• Inherent “noise” in metadata, content labeling, content descriptions, tagging, and related online conversations

• Sparse geolocational data in microblogging messages and in uploaded imagery / videos (in terms of “exchangeable image file format” or “EXIF” data)

26

Page 27: Exploring Social Media with NodeXL

Local NodeXL Mitigations to Data Limitations

• Re-running the data extraction on different machines but with the same parameters at the same time

• Running the data extractions at slightly different times

• Running multiple and different data visualizations on the same dataset

• Using multiple seeding terms for a particular issue

27

Page 28: Exploring Social Media with NodeXL

Using an N = All

• Capturing an N= all through Gnip (a company now owned by Twitter) or a similar company (unless Gnip has an exclusivity contract)

• Working directly with the company or organization behind the social media platform (particularly their research divisions), but research may be embargoed (restricted from any release or publication)

28

Page 29: Exploring Social Media with NodeXL

Using Proper Research Practices

• Posing research questions in strategic ways: Ask ambitiously, but do not over-claim from results

• Respecting the research traditions and methods of the respective domain or field

• Applying serious efforts at (dis)confirmation of findings

29

Page 30: Exploring Social Media with NodeXL

Using Proper Research Practices (cont.)

• Capturing multiple streams of data (often in a cross-platform way)

• Documenting all data extraction parameters, data processing, and data provenance issues

• Using multiple analytical tools to analyze the captured data

• Comparing cyber info with real-world info (determining where the cyber-physical confluence lies)

• Using accurate qualifiers to the presented data30

Page 31: Exploring Social Media with NodeXL

3. NodeXLNetwork Overview, Discovery, and Exploration for Excel

31

Page 32: Exploring Social Media with NodeXL

NodeXL “Template”Brief History• Formerly known as .NetMap

• First released in July 2008 as an add-on to Microsoft Excel

• Available at the Microsoft CodePlex site

• Supported by the Social Media Research Foundation (SMRF) with the tagline “Open Tools, Open Data, Open Scholarship for Social Media”

• Third-party data importer tools to NodeXL available through integrated links available through NodeXL

APIs and Add-ons• Application programming interface (API):

protocols for the building of software applications to interact with (in this case) public-facing social media platform databases

• Add-on: An addition to a software program to add functionality

32

Page 33: Exploring Social Media with NodeXL

Workspace

33

Page 34: Exploring Social Media with NodeXL

4. Social Media Platforms

34

Page 35: Exploring Social Media with NodeXL

Social Media

• Integrated online sites and applications that enable people to …

• Interact

• Inter-communicate

• Share information, digital artifacts and objects, materials, funds, and other elements

• Collaborate (co-create knowledge, fund-raise, support, and others)

• Create continuing and long-term profiles

35

Page 36: Exploring Social Media with NodeXL

Web 2.0 / the Social Web• Microblogging site: Twitter, Sina Weibo

• Social networking sites: Facebook, LinkedIn

• Wikis: Wikipedia (with a MediaWiki understructure)

• Video sharing: YouTube, Vimeo

• Image-sharing: Flickr

• Blogs: WordPress (understructure)

• Email:

• Short message service (SMS):

• and others36

Note: It helps to immerse in each platform and observe how users use the platform and how the platform’s community responds to in-world events. It helps to challenge assumptions about how things actually work vs. how one assumes it works.

Page 37: Exploring Social Media with NodeXL

Social Media Accessible via NodeXL

• Facebook Fan Page Network

• Facebook Personal Network

• Flickr Related Tags Network

• Flickr User’s Network

• MediaWiki Page Network*

• Twitter Search Network (#hashtag, keyword, other)

• Twitter User’s Network (@account, @group)

• Web 1. / Blog Network (via VOSON / “Virtual Observatory for the Study of Online Networks”)*

• YouTube User’s Network

• YouTube Video Network (topic)

• [3rd party graph data importers]*

37

Page 38: Exploring Social Media with NodeXL

Social Media Account Types

• Social media accounts

• Public or private accounts

• Individual or group (often topic-focused) accounts

• Human, cyborg, ‘bot (including socialbots)

38

Page 39: Exploring Social Media with NodeXL

Application Programming Interfaces (APIs)

• Application Programming Interfaces (APIs) enabling access to some limited data from the social media platforms

• Often rate-limited by the social media platform

• Enables downloading of a percentage of the available public data (full amount of dataset not indicated by the API)

• Data released by content creators through the end user license agreements (EULAs)

• Data scraping also possible39

Page 40: Exploring Social Media with NodeXL

Application Programming Interfaces (APIs) (cont.)

• Access requires an email-verified account to “whitelist” to access the data (to enable the platform’s rate-limiting)

• Some (like Flickr) require a secret and a key

• Terms of access change, and developers may not keep up with changing the software to ensure some access

40

Page 41: Exploring Social Media with NodeXL

Types of Social Media Data Available

NodeXL• Topical slice-in-time; dynamic and continuous

(for a certain period of time) (on Twitter)

• Protected user accounts in Facebook (with log-in authentication into Facebook)

• Public-facing user accounts in Flickr, YouTube, and fan accounts in Facebook

• Article edits in Wikipedia

Others• Tweetstreams going back in time (up to about

3,000 per account) (NCapture in NVivo, on Twitter)

• Geomapping of Tweets (NCapture in NVivo, on Twitter)

• Links between accounts on social media platforms to the Surface Web (MaltegoChlorine 3.6.0)

41

Page 42: Exploring Social Media with NodeXL

5. Data Extraction Runs

42

Page 43: Exploring Social Media with NodeXL

General Parameters of a Data Extraction

• Seeding term(s)

• Boolean data types (sets) [# and #; # and keyword; tag and tag]

• Type of social or content network (or two-mode / bipartite or multi-mode networks)

• Degree of network (1, 1.5, or 2)

• Amount of vertices or messages or videos (size of network), and others

43

Page 44: Exploring Social Media with NodeXL

6. Data Processing

44

Page 45: Exploring Social Media with NodeXL

45

Page 46: Exploring Social Media with NodeXL

Graph Metrics

• Selection of desired metrics of the extracted graph

• Processed on the local machine • May have to process in parts

and pieces (instead of “select all”) because of machine processing limits (saving after each iteration)

46

Page 47: Exploring Social Media with NodeXL

Graph Metrics (in detail)

• Overall graph metrics

• Vertex degree (undirected graphs only)

• Vertex in-degree (directed graphs only)

• Vertex out-degree (directed graphs only)

• Vertex betweenness and closeness centralities (a measure of influence in the network based on “bridging” along shortest paths / transmission / propagation efficiency)

• Vertex eigenvector centrality (a measure of influence in the network based on connectivity to influential or high-scoring nodes)

• Vertex PageRank

• Vertex clustering coefficient

• Vertex reciprocated vertex pair ratio (directed graphs only)

• Edge reciprocation (directed graphs only)

• Group metrics

• Words and word pairs

• Edge creation by shared content similarity

• Top items

• Twitter search network top items

47

Page 48: Exploring Social Media with NodeXL

Resulting Global-View Graph Metrics Table

48

• Vertices

• Unique edges

• Edges with duplicates

• Total edges

• Self-loops

• Reciprocated vertex pair ratio

• Reciprocated edge ratio

• Connected components

• Single-vertex connected components

• Maximum vertices in a connected component

• Maximum edges in a connected component

• Maximum geodesic distance (diameter)

• Average geodesic distance

• Graph density (or sparseness)

• Modularity

Page 49: Exploring Social Media with NodeXL

7. Network Graph Data Visualizations

49

Page 50: Exploring Social Media with NodeXL

Data Visualizations

• Graph Layout Algorithms: Fruchterman-Reingold, Harel-Koren Fast Multiscale, Circle (Ring Lattice Graph), Spiral, Horizontal Sine Wave, Vertical Sine Wave, Grid, Polar, Polar Absolute, Sugiyama, and Random

• Autofill Columns

• Dynamic Filters

• Layout Options

• Graph Options50

Page 51: Exploring Social Media with NodeXL

51

Page 52: Exploring Social Media with NodeXL

Toggling between the Graph Visualizations and the Underlying Data

Data Cleaning• Deletion of information from the graph that

may not be directly relevant (from the data worksheets)

• De-duplication of messaging (if relevant)

Data Filtering• Using “Dynamic Filters” to select particular types

of data of interest to show in the graph pane: relationship date, Tweet Date (UTC), x-axis, y-axis, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, clustering coefficient, reciprocated vertex pair ratio, followed, followers, Tweets, favorites, joined Twitter date (UTC)

• (and UTC degree time to geo-location)

52

Page 53: Exploring Social Media with NodeXL

Dynamic Filters

53

Page 55: Exploring Social Media with NodeXL

NodeXL Graph Gallery

• Set up as a place for shared research about social network graphs

• Includes experimental interactive versions of the graphs (if GraphML version is enabled in the upload by the creators of the data)

• Includes some downloadable datasets

• Enables email-verified account creation (which allows the revision of related texts and reversing publication of graphs)

• No commenting on others’ graphs or datasets here

55

Page 56: Exploring Social Media with NodeXL

NodeXL Virtual Community and Resources

• NodeXL on CodePlex

• Source Code (open-source)

• Documentation

• Discussions

• Issues

• License (Ms-PL, Microsoft Public License)

56

Page 57: Exploring Social Media with NodeXL

9. Beyond NodeXLOther Complementary Software Tools

57

Page 58: Exploring Social Media with NodeXL

Other (Complementary) Tools

Surface Web Data Collection• Maltego Chlorine 3.6.0 (commercial

“subscription” license but with a limited community version)

• NCapture of NVivo 10 (commercial license: perennial or subscription-type site license)

Text Analysis• Natural Language Toolkit (NLTK) in Python

(open-source and free)

• AutoMap and NetScenes (CASOS) (open-source and free)

58

Page 59: Exploring Social Media with NodeXL

10. Presentation Review

59

Page 60: Exploring Social Media with NodeXL

Review: NodeXL Capabilities

NodeXL Capabilities with Social Media Data

• Data extractions from both social media platforms and the Surface Web (with VOSON or “Virtual Observatory for the Study of Online Networks” third-party data importer server)

• Additional social media platforms in the works

NodeXL Capabilities • Network graph data processing

• Network graph analysis

• Graph visualizations

• Multi-lingual data processing

• … and others

• Addition of rudimentary sentiment analysis in commercial version (“NodeXL Pro”) released in 2015

60

Page 61: Exploring Social Media with NodeXL

A Short Note about the Sentiment Analysis Feature

• Based on a positive-negative polarity

• Uses a built-in positive word set and a built-in negative word set

• Customizable

• Enables the addition of a third type of word set (a new construct) based on a custom-made text set

61

Page 62: Exploring Social Media with NodeXL

62

Page 63: Exploring Social Media with NodeXL

11. Some General Takeaways about Network Analysis using Social Media Data

63

Page 64: Exploring Social Media with NodeXL

Some General Takeaways

• Unique aspects of social media platforms and their particular users. The social media platforms are constantly changing. Their users and their metrics are critical to understanding the extracted data. As such, only some voices are captured via social media platforms.

• In other words, who is online, and how are they actually using the social media platforms? What geographical regions are covered? (How does this skew the data?)

64

Page 65: Exploring Social Media with NodeXL

Some General Takeaways (cont.)

• Nature of social media platforms. The nature of the social media platforms are important—whether they are for content sharing, knowledge structures, social networking (and for what purpose), and so on.

• Rules of engagement change what is seeable and seen in terms of messaging

• Technically, how “entities” and “relationships” are defined depends on the social media platform. (Read the fine print. Read the developers’ pages.)

• Continuous (dynamic data) vs. slice-in-time (static data); access to historical data

65

Page 66: Exploring Social Media with NodeXL

Some General Takeaways (cont.)

• A sampling. There are numerous dependencies in terms of data extractions. The connectivity speed, the busyness of the target servers, the rate limiting of the application programming interfaces (APIs), the dynamism of the data, and such, affect what is collected. This sampling is not a random sample, but it is hard to know how much of a part of a full set has been captured. In most cases, only a very small sample is acquired.

• Very rarely is a full set possible, and only for particular types of data (such as an article network from Wikipedia).

66

Page 67: Exploring Social Media with NodeXL

Some General Takeaways (cont.)

• Data visualizations used with underlying data: The data visualizations are rich and varied; however, they are always in a sense less than the full set of information. By definitions, data visualizations are data summaries.

• The “graph metrics” table is a critical aspect of the information. Data visualizations should be used with the underlying data.

67

Page 68: Exploring Social Media with NodeXL

Some General Takeaways (cont.)

• Understandings of how social media platforms are used: The general public tends to be a lot faster than one would assume in terms of responding to breaking events with messaging across the various platforms.

• Any eventgraphing has to draw from all public sources (and across social media platforms) because each contributes different angles and perspectives on the events; each also attracts different portions of the population.

• (And of course, a lot of information is not publicly shared, so the whole social media angle is still somewhat limited.)

68

Page 69: Exploring Social Media with NodeXL

Some General Takeaways (cont.)

• Speed: With unfolding events on social media, most have gone to automated means to surveil and monitor communications. Computational text analytics (and visual analytics) are applied to the messaging in order to see • what is trending

• the strength and direction of sentiments (positive or negative)

• the types of emotions expressed and in what textual contexts, and so on.

• There is progress in terms of computational visual analysis for object identification, facial recognition, and others.

69

Page 70: Exploring Social Media with NodeXL

12. Questions? Comments?

70

Page 71: Exploring Social Media with NodeXL

Questions? Comments?

71

• What research questions are you interested in pursuing? What is the potential role of social media in augmenting your (main) research?

• How do you think you might go about capturing the required social media information? How would you confirm or disconfirm any findings?

• What are complementary streams of data you could use to bolster your work?

Page 72: Exploring Social Media with NodeXL

Questions? Comments? (cont.)

• How would you pursue leads that are surfaced from social media? The Surface Web?

• How would you represent your work in publication and / or presentation (to show methods, explain complexity, and delimit your assertions)?

• What skills can you hone in order to better exploit public social media data? What do you perceive as strengths in this area? Weaknesses? Why?

72

Page 73: Exploring Social Media with NodeXL

Conclusion and Contact

• Dr. Shalin Hai-Jew

• Instructional Designer, iTAC, K-State

• 212 Hale / Farrell Library

[email protected]

• 785-532-5262

• Querying Social Media with NodeXL (an open-source text on the Scalar platform)

73


Recommended