+ All Categories
Home > Data & Analytics > Approaching Big Data: Lesson Plan

Approaching Big Data: Lesson Plan

Date post: 15-Jul-2015
Category:
Upload: bessie-chu
View: 223 times
Download: 4 times
Share this document with a friend
Popular Tags:
52
Leveraging Engagement: Big Data Lesson
Transcript

Leveraging Engagement: Big Data Lesson

Agenda What is Big Data? •  Some Definitions •  Mixed Methods Approach Champion’s League & World Cup Case Study •  Process •  Results and Usage •  Pitfalls and Learnings Moving Forward •  Data Approach Flow •  Caveats •  Organization and Communication

What is Big Data? So many different definitions… nobody quite

agrees…. … except that it’s definitely a buzzword

What is Big Data? It is just generally agreed upon that it’s messy and complex. This

is an opportunity and challenge for us to innovate.

“an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or

traditional data processing applications.”

“Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is

so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves

too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and

make faster, more intelligent decisions.”

“Volume, Variety, Velocity, Variability, Complexity”

Quotes  from:    h-p://www.forbes.com/sites/gilpress/2014/09/03/12-­‐big-­‐data-­‐definiBons-­‐whats-­‐yours/2/  h-p://www.webopedia.com/TERM/B/big_data.html  h-p://en.wikipedia.org/wiki/Big_data  

What We Do Need to Solve Big Data?

… for leveraging engagement at least.

…  for  leveraging  engagement  at  least.  

Determine  Right  QuesBons  and  Goals  for  Data  

Interdisciplinary  Approach  

IteraBve  Refinement  

“Combining the what (quantitative) with the why (qualitative) can be exponentially powerful.  It is also critical to our ability to take all our

clickstream data and truly analyze it, to find insights that drive meaningful website changes that will improve our customers’

experiences.” – Avinash Kaushik

Answer: Mixed Methods and Innovation

Quote  from:    Web  AnalyBcs  in  One  Hour  a  Day  by  Avinash  Kaukshik  

CHAMPIONS LEAGUE AND WORLD CUP BIG DATA DISCOVERY PROCESS

Annenberg Lab Framework

GOALS

Sports Fan and Engagement Study Overall Goals for HAVAS

•  to identify and define communities of sports fans based around passion points (A)

•  to analyze fan interactions with those passions (B) •  position HAVAS Sports & Entertainment to more

effectively advise brands on how to meaningfully engage with sports fans by leveraging passion-based communities. (C)

Big Data Research Objectives • Discover a mixed

methodology framework for sports and entertainment fan engagement

External for

Havas

•  Justify our fan logic topology in relation to Twitter conversations through natural language processing

Internal for Lab

Initial Data Collection Steps 1) Modify data collection process to fit live

soccer events using Champion’s league as a test run

2) Establish methodology in seeding initial pool of users, keywords, and hashtags

3) Analyze tweets and how they fit into logics of engagement

4) Establish methodology in how to gain insight from twitter conversations

“Analyzing Big Data is a BIG JOB with Many People” – Jake

Inputs & Equipment

Keywords, hashtags, user clusters file on txt document

Dedicated server system

colllecting information

Engineering

Run and modify Python script

Register Public Screening API

Parse for results

Live Viewing Team

Team to watch game and look for patterns

Data Collection Process Engineering &

Team: Tech and Data Set-Up

Engineer: Run Script with Seed

File

Team: Watch Event for Patterns and Additional Seeds

Team: Decide Data to Analyze

Engineer: Parse Data into User-Friendly Format

Team: Look at Data and prepare for

next event

DATA SEED METHODOLOGY

Initial Keyword Seed Scoping

Keep it simple Discover through

observations

Soccer Hashtags and Keywords Official

Hashtags Sponsors Team Names

Key Terms

Key Players

Headliners

Official Organization Handles Official Team Handles

Official  Hashtags   Sponsors   Team  

Names   Key  Terms   Key  Players  

Sponsors

Sponsors will often have official hashtags promoted during sporting events to cross-promote their brand and the sporting

event.

Official  Hashtags   Sponsors   Team  

Names  Key  Terms  

Key  Players  

Supporting Characters

Superfans -Fans with unusual

followings on Twitter

Sports Commentators

-ESPN commentators

and the like

Prominent Bloggers -Blogs or

bloggers with large following

on certain teams

Initial Data Seed Scoping Caveats • Twitter caps at couple of thousand tweets per second on Public API • Public API received tweets do not appear to be affected by location based factors the way individual user feeds are • Twitter chunks these tweets in mysterious algorithm it deems important • Number of Tweets scrapped render these factors nominal in terms of large-scale user behavior

ENGAGEMENT HYPOTHESIS & ANALYSIS

What kind of Tweets or tone in tweets fit into logics of

engagement? *Informed by survey and ethnography

Entertainment Immersion Social Connection Identification

Mastery Pride Play Advocacy

Operational Process

Plan for World Cup & Modeling with Beacon Capabilities

See how conservations analyzed from a big data perspective fit and build on the logics of engagement model

Determine what data frameworks worked in capturing useful information

Initial qualitative look at data

Exercise: Seed Scoping

Questions on Approach Before We Get Into Analysis?

Big Data

Analysis Process to Dashboard

Big Data Basic Methods of Analysis

• Text processing of tweets and plotting using algorithms into agglomerative clusters (aka cool visuals)

• Frequency of terms, associations, and word clouds fall under here

• Goal: Find texts of what spurred the most conversation

Textual

• A way to visually see social connection data • Understand forms of bonds and the connections between

individual data points worth exploring • Goal: Detecting communities (our clusters, brands)

Networks

• Toolkits (such as Hootsuite) that measure “sentiment” using positive and negative language

• Can be used to see if an initiative performed well • Goal: Measure success of a campaign at different times

Sentiment

Big Data Low-Hanging Fruit - Topline

Rt Author Screenname FIFAWorldCup 76172 9GAG 37459 DFB_Team_EN 21247 BBCSport 19564 FCBayern 14782 FTBpro 13409 _Snape_ 11371 benparr 10616 TheTweetOfGod 9435 espn 7465 Queen_UK 7174 thereaIbanksy 7113 sulsultm3 6646 damnitstrue 6603 asshaaban 6513 SportsCenter 6470 fifaworldcup_es 6365 LicDice_ 6361 FIFAworldcup_e 6241 DFB_Team 6114 Argentina 5964

Big Data Low-Hanging Fruit – Sentiment Analysis

Fan Handles 1 Game

Data 2 Brand Data 3 Integrate insights

with Ethnographic and Survey Data for

final deliverables

Initial Idealized Approach

•  Survey Twitter Handles –  See if their online behavior matches survey logics –  What does the content they’re sharing look like –  Trends by cluster, gender, other data points

•  Match Data –  Look for clusters of behavior to events in games –  See popularity of brand campaigns and behavioral response to brand stories –  Gain insight from bursts of activity and real-time marketing –  See what are characteristics of influencers

•  Brand Data –  Identify how these strategies were executed in online conversations and responses –  Identify types of interactions/content/other markers around brands on Twitter –  Do influential brands mean consistent users interacting across brands? Why are people

interacting in this way? How can we categorize these interactions according to our logic clusters?

–  Was the content agile? –  See how users responded by the logics to different types of content –  Look for differences in fan response and fan-initiated behavior to the brands

Questions and Hypothesis

What We Planned To Do •  Steps

•  Define interesting WC fan moments and brand moments •  Examine moments in time and certain brand campaigns •  Investigate possible Natural Language Processing tools •  Formulated Questions

•  Timeline •  Created a timeline assigning roles to each person

•  Deliverables •  TBD, likely looking at clusters of behavior around brand campaigns. •  Sentiment analysis may tie in here

Ethnographic Report

-What did people say about the brand or the

logics they used?

Survey Data -Under this brand

logic utilized, what is the

intensity and who are the clusters?

Big Data -How did audiences

respond online to actions by the

brand?

Approaching with Mixed Methods

Exercise: Group Datasets

Figure out what insight you might be able to get from each piece of data and how

would you apply mixed methods.

Dashboard Process

The Future of Social Media Analytics

“We will be moving beyond key-word based queries into machine-learning algorithms. Influencers whom I have with with echo

similar ideas about the increasing use and refine of latent semantic indexing (or some

variant of it) and other machine-learning algorithms in order to improve social listening, automatic categorization of

content, and the ability to take action on data” - Marshall Sponder

Key Learnings for Mood

Board

Ethnography

Survey Twitter Data

Brazil

Brought Together All Data

Concept Creation

The Dashboard Build Process

Pulled 250 Retweeted Tweets with Verification

from BigSheets

Coded Tweets

According to Logic for

Testing Data

Built Dictionary

According to Sample Tweets,

Ethnography, Survey

Created Natural

Language Processing

and Machine Learning

Algorithms

Fan Engagement Dashboard Prototype

Model

Technology

Collaboration

Innovation Fan Engagement Dashboard Prototype

jStart Beacon Custom-Built Twitter Collection Web App jStart BigSheets

Leveraging Engagement Framework

Annenberg Innovation Lab Fan Engagement Dashboard built through

collaboration and mixed methods learning.

67% Accuracy in classifying tweets by Logic of Engagement leading to

actionable insight and business intelligence for Leveraging Fan Engagement.

The Process End-to-End Collecting and Managing Data Data Back Up Data Clean Up Run Models

Gain Insights Refine Models Learn Actionable Insights

Communicate Insights (Reports,

Infographic Blueprints)

Create Initial Dictionary for

Natural Language Processing

Annotate/Code Tweets for

Training Data for Machine Learning

Created Dashboard

Improve on Design

Now What?

Moving Forward Your Challenge •  Your data will be different

client-to-client •  Twitter is just the beginning •  Your will get to be creative

and work on collaborative cross-functional teams to dive into the data

•  *This will be both rewarding and potentially difficult

Tasks Ahead •  Begin thinking about

what you can learn from data to help our sponsors reach their goals

•  Start thinking about how your fans behave in your approach to figuring out what questions to ask the data

Most Basic Steps

Determine Goals Capture Data Curate Data

Merge Datasets and Bring Together Methodologies if

Necessary

Additional Data Processing to Usable Form

Deliver Insight to the Client

Thinking About Process

Bumps in the Road Ahead •  Privacy Issues and

Respecting the Fans •  Company layers and

politics – releasing data from companies is fraught with back and forth

•  Getting data into a usable form

•  Assumptions were wrong or have to be redefined – it’s ok to fail fast – but be ready to keep moving

•  Working in cross-functional groups

Image  from:  CapGemini  h-p://www.capgemini.com/sites/default/files/technology-­‐blog/files/2012/09/big-­‐data-­‐vendors.jpg  

Cross-Functional Communication

Goal   Timing    

Point  People  

Resources  Needed    

Bring it Together

Draw connections between the data sets and how could they relate to the eight

logics and situational triggers.

“While social media data are always interesting in themselves (at least, for an analyst), when business owners are able to combine data and layer them

efficiently, the information will become more useful and actionable.” – Marshall Sponder

Thank You

Questions?


Recommended