Post on 12-May-2015
description
transcript
Building Effective Frameworks for Social Media Analysis
Presented by: Josh Liss
Open Analytics Summit DC 2013
Segway
- 200+ million users in 200 countries – techcrunch
- Incredible amount of personal information
- 10 million mo. unique visitors faster than any independent site in history – Sirona Consulting
- 28.1% annual household income of $100K - ultralinx
- 1+ billion monthly active users - facebook
- 17 billion geo-tagged pictures & check-ins - gizmodo
- 230+ million monthly active users - globalewebindex
- 175 million tweets/day in 2012 – infographics labs
- Google + button used 5 billion times/day - alltwitter
- 625,000 new users on Google+ every day - alltwitter
Agenda
• Social Media: An Intelligence perspective• Common Analytic Pitfalls• An Analytic Framework• Case Study: Superstorm #Sandy
– Problem Definition– Source Selection– Data Capture– Data Reporting– Data Analysis
• The Way Forward – do’s & don’ts• Discussion
Intelligence
• Intelligence is information that has been transformed to meet an operational need
Data Intelligence
Operational Lens
Intelligence Cycle
• No matter what methodology you use…
intelligence analysis is an iterative process.
Collect
Store
Analyze
Distribute
Social Media: Intelligence Perspective
• Intelligence derived from social media brings with it the best and worst aspects of:– HUMINT– SIGINT– OSINT
HUMINT
SIGINTOSINT
Social Media Analysis Goals
• Provide value to the organization – turn data into
intelligence using an “operational lens”
• Ensure cyclical feedback occurs during collection,
processing, analysis, and consumption
• Validate that a particular network is the right source
of data for the questions you need answered
• $$$
Common Misconceptions
• Social media is not a panacea– Not everyone uses social media– Users of social media use it unevenly– User behavior changes based on situations
• Just because people can talk about anything does not mean they talk about everything all the time.
Common Pitfalls
• Analyzing What Instead of Why:The important thing is often not what people are saying… but why they are saying it.
• Using the Wrong Analysis Tools:Reporting tools rarely help dig into the why. Many common tools, reports, and metrics are misleading:– Word clouds atomize message context– Sentiment metrics are often highly inaccurate– Information in aggregate hides more than it reveals
Pitfalls: An Example of the Challenge
Pitfalls: An Example of the Challenge
Dangers of Disintegration
Source: Matthew Auer, Policy Studies Journal, Volume 39, Issue 4, pages 709–736, Nov 2011
The problems are analytical rather than aesthetic or technical. The context is virtually indecipherable: -
Analytic Framework
• Data Capture (DC)• Data Reporting (DR)• Data Analysis (DA)
– What to measure– What the data is saying– What should be done based on the data
Source: Avinash Kaushik, Occam’s Razor Blog http://www.kaushik.net/avinash/web-analytics-consulting-framework-smarter-decisions/
Capture
ReportAnalyze
Choosing a Platform
• Social media, and the ways that it is used, is relatively new and evolving rapidly:– Static approaches to social media are flawed from
the outset– No one metric or set of metrics will always let you
know what is happening– No turn-key solution to all problems
• Platforms need to be open and highly adaptable to facilitate data capture, reporting, and analysis
Case Study: Superstorm Sandy
• Industry: Disaster Response/Crisis Informatics– 14 Billion-dollar disasters in 2011– 11 Billion-dollar disasters in 2012
• Over $100 Billion in total damages
• Oct 29 2012 - Hurricane Sandy– $50+ Billion Damages– 72 deaths directly attributed to storm
• Additional 87 deaths indirectly attributed
• Can social media SAVE money/lives/resources?
Problem Definition
• Question: How can social media assist civil authorities responding to natural disasters:– Prevent/limit loss of life and limb– Prevent/limit damage and loss of property– Protect critical infrastructure
• Challenges: Capture relevant information from social media sources.– Query too large/broad = false positives– Query too small/narrow = miss potential information– Signal vs. Noise
The Source: Twitter
• Twitter has excellent analytical potential:– Enormous volume, 400+ million tweets per day– Large user base, 200+ million active users– Open API
• But its not without its limitations:– 140 characters– Limited historical (look-back) capacity without using
a 3rd party provider like DataSift or GNIP = $$$– Anonymity, credibility– Fact vs. satire
Data Capture
• 975,000+ Tweets – Filters: temporal, geo, keywords, hashtags– Timeline: 28 Oct to 06 Nov
• Pre-land fall, Land-fall, Aftermath, Recovery
– Geo focus on Tri-state area• Entity Extraction / Sentiment
– NLP extracts the entities, events and associations from unstructured text
• Isolates Twitter Handles, Keywords, URLs, etc.
Data Capture: Entities & Associations
Hashtags
Twitter Handles
URL
Unstructured Keywords
Time / Date Stamp
WhoTwitterHandles, retweeters
WhatHashtags, Keywords, URLs
WhenTime, Date
WhereGeo (if Available)
Data Reporting
Data ReportingKeywords Twitter handle
Data Analysis
• Analysis must be rooted in the operational need:– How can social media help civil authorities & first
responders during natural disaster response and relief efforts.
• Emphasis on hypothesis generation, testing, and experimentation
Data Analysis: Hashtags
• Top hashtags were almost all generic or abstract– Undermines tracking and understanding– Generates leads for further analysis
Hashtags#Sandy #Recovery#NYC #Power#Hoboken #SandyABC7#NJ #Gas#Brooklyn #JERSEYSTRONG
Data Analysis: Sentiment
• Sentiment analysis on small chunks of text like Tweets is generally poor
• Follow and convert linked URLs into derivative sources
Larger text sources offer potential value with sentiment analysis that tweets alone cannot offer
Data Analysis: Sentiment
• Top negative and positive sentiment scores can provide a glimpse into aggregate attitudes
• Provide starting points for additional analysis
Data Analysis: Narrow the scope
Next Steps: Agile Intelligence
• New Problem Identified:– NYC 911 received approx. 20,000 calls/hour– Life/limb emergencies could not get through– Callers prompted to text or call 311
– NYC spent $2 Billion since 2009 “overhauling” the system• $680 Million call center – “Unified Call Taker” system
• New Question: Can social media serve as a supplement/alternative to traditional emergency response systems during times of natural disasters, state of emergencies?– Promote/monitor hashtags– Dedicated analysts/dispatchers– Facilitate proactive use of local/city/state resources
Next Steps: Segment the Data
• Segment, or cluster, your data by:– User name or twitterhandle– Hashtags– Keywords– Geographic region– Timelineto explore patterns and trends at the micro level versus the entire dataset
Next Steps: Try on different lenses
Highest traffic occurred during the height of the storm, despite spreading power outages
Next Steps: Segment the Data
< 5% of of Tweets are geo-tagged
Next Steps: Graph Analysis
Visualize associations between top influencers
Next Steps: Findings
• Targeted queries based on tailored information requirements
• Findings:– Few legitimate “calls for help”– No dedicated #’s
• #help used for encouraging donations/volunteering• #distress used for
– Significant & accurate i-reporting on flooding, downed trees/power lines, fires, etc.
– Crowd-sourced info on where to find gas, food/water, donate goods, volunteer, etc.
– Despite widespread power outages, cell service was a life-line
Lessons Learned
• Don’t:– Try drinking from a fire hose
• sometimes less really is more
– Use metrics you can’t tie to actions
– Use visualizations or reports that strip the data
from its context
Lessons Learned
• Do:– Segment data rather than attempting to work in
the aggregate
– Look for the why behind the message
– Always return to the source material
– Explore alternative explanations
– Always consider the ultimate goal
Discussion
Success stories or lessons learned from social media analysis/monitoring in 2012?
Arguments for or against the use of social media?
Where will social media monitoring/analysis be in 2014?
Thank You!
Joshua Lissjliss@ikanow.comwww.ikanow.com
github.com/ikanow/infinit.e