Social Media Data Mining

Post on 11-Feb-2017

139 views 0 download

transcript

Collecting and Analyzing Social Media Data Data Mining and Sentiment Analysis with Twitter’s Streaming API

Ryan Reede Oct. 2015

We’ve Created a Monster...We generate unimaginable amounts of data.

In one minute on the web there are...275,000 Tweets

417,000 Tinder Swipes

2,460,000 Facebook Posts

1,800,000 Likes on 216,000 new Instagram Posts

+ User behavior

What has made this possible?Faster Computers

Cheaper Storage

Smarter Software Algorithms

(What tweets looks like on the server)

How do we get from this...

BETris App Insight through Social Media

Quick hits:Our players get extremely frustrated on ‘insane’

mode.

The iAds are too distracting.

Over 200 players screenshotted and shared their high score on Instagram today.

Advanced analysis:Players are most active on Twitter after 9pm, let’s

sponsor tweets then.

90% of first time players are logging in with Twitter accounts, let’s focus resources there, not Facebook integration.

...to info like this?

Data Mining is the aggregation and analysis of massive amounts of information by computers to extract meaning.

What’s so great about Social Media specific data?Human interaction becomes observable, at massive scale.

It’s the only way to gain access to data on nearly half of the world.

Simpler → More advanced analysisIdentify influencers and communities online.

Measuring an individual's influence online.

Recommending content to users.

Data Mining?

Proper analysis of Social Media data can’t be done alone:Sociology

Computer Science/Machine Learning

Statistics

Neuroscience

Ethnography and more…

Inherent challenges exist:Social data comes in many different forms

Proprietary/Hard to access

The goal of Data/Computer ScienceMaking everything measurable numerically

Multidisciplinary Approach

Application Programming Interfaces help developers bring services from one company’s software into their own and vice-versa.

Uber uses the Google Maps API to display maps in their app.

HootSuite uses Twitter’s Post API to allow users to tweet from their app.

APIs for Data Mining:APIs can offer easy access to valuable data.

Using APIs does not always require a high degree of technical skill.

Facebook & Twitter have free public feed APIs

Gathering Data: APIs

Social Network Analysis and Graph Theory:

Dijkstra’s Algorithm: shortest path between two nodes (individuals) in a graph (social network).

Tightness of a community in a network

Similar algorithms can also measure:

Centrality to a community

Closeness to another individual

Analyzing content can yield other insight

Lexical analysis

Recommendation algorithms

Step 2: Processing the Data

Business decision making will never be the same

Pre-demo TakeawaysSocial Media generates petabytes of

varying data for mining.

Acquiring this data can be done various ways, but APIs can make it easy.

Most Social Networks have these to share some data!

If you’re not a developer, tools built on top of these APIs can be valuable.

IBM’s Watson Analytics

Sentiment140 and Topsy

Awareness of basic analysis methods

Small Scale Data Mining: Computing Tweet Sentiment

Built in Python with the Twitter Streaming API

Lexical Analysis compares words in tweet to a 3500 word dictionary for a total sentiment score.