1!
What do we do with all this big data?!Fostering insight and trust in the digital age!Susan Etlinger, Industry Analyst, Altimeter Group!February 11, 2015!!
Photo: qthomasbower, cc 2.0!
2!
“Orwell feared those who would deprive us
of information. Huxley feared those who would give us so much that we would be reduced to passivity and egotism. Orwell feared that the truth would be concealed from us. Huxley feared the truth would be
drowned in a sea of irrelevance.”!Neil Postman!
Amusing Ourselves to Death, 1985!
3!
What’s so hard about big data?!
4!
“Ninety percent of all the data
in the world was created in the past two years.!
− IBM!
5!
Big Data as defined by Gartner*: !• Volume!• Velocity!• Variety!
Variety is the most challenging.!
What is big data?!
* NB: This is really just a starting point for understanding big data. See Gartner for research on origins and definitions.
6!
With Big Data, Size Isn’t Everything!
Images Text
Video Audio
7!
How unstructured data disrupts!1. Does not conform to standard data models!2. Demands new analytical approaches!· Human expression: images, text. Much is
unstructured.!· Raw material—requires processing to translate
it into something a machine can understand & act upon!
3. Strains traditional methodologies !Source: http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola?utm_content=buffer6a337&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#trending
8!
A simple example…!
Source: http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola?utm_content=buffer6a337&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer#trending
9!
“The nature of human
language demands rigorous and repeatable processes to
extract meaning from it in a transparent and defensible
way.!
10!
Unstructured Data Requires New Analytics !
11!
Unstructured Data Requires New Analytics !
12!
Case Study: Health Media Collaboratory!
What can social data tell us about smoking cessation?!
• How much electronic promotion exists on Twitter?!
• How much organic conversation about e-cigarettes is on Twitter?!
Understanding the impact of CDC anti-smoking
commercials !
• Did the commercials work?!• How can we prove it?!
13!
Disambiguating e-cigarettes!
14!
Disambiguating “smoking”!
15!
Methodology!1. Data collection. Determine appropriate source and sample size of the
data.!2. Keyword selection. Generate the most comprehensive possible list of
keywords, encompassing nonstandard English usages, slang terms, and misspellings.!
3. Metadata. Collect metadata related to the tweets: !· A tweet ID (a unique numerical identifier assigned to each tweet) !· The username and biographical profile of the account used to post the
tweet!· Geolocation (if enabled by the user) !· Number of followers of the posting account !· The number of accounts the posting account follows!
16!
Methodology!3. Metadata (continued). Collect metadata related to the tweets:!
· The posting account’s Klout score !· Hashtags !· URL links !· Media content attached to the tweet.!· Filtering for engagement. !
4. Human coding. To assess relevance and code message content.!5. Precision and relevance. Combination of human and machine coding.!6. Recall. To determine whether data was generalizable.!7. Content coding. To determine message effectiveness.!
17!
“Eighty-seven percent of the tweets
about the TV commercials expressed fear, and the ads had
the desired result of jolting the audience into a thought process that might have some impact on
future behavior.!!
− Health Media Collaboratory!
18!
From data to insight!
19!
“The type of data (structured, text, etc.) isn’t the point at all. The way of thinking matters.”!!− Philip B. Stark, professor and chair of statistics, University of California, Berkeley!
20!
Logic problems: causation vs. correlation!
21!
From Insight to Trust!
22!
“ In civilized life, law floats in a sea of
ethics.”!− Earl Warren, former chief justice
of the United States!
23!
Three Recent Examples!
24!
Ethical Issues Related to Data!
25!
Planning for data ubiquity!
26!
Planning for 2015!① Define data strategy and operating model!② Update analytics methodology to reflect new
data realities!③ Seek out critical thinking and diverse skill sets!④ Insist on ethical data use and transparent
disclosure!⑤ Reward and reinforce humility and learning!
27!
Thank You!
Disclaimer: Although the information and data used in this report have been produced and processed from sources believed to be reliable, no warranty expressed or implied is made regarding the completeness, accuracy, adequacy or use of the information. The authors and contributors of the information and data shall have no liability for errors or omissions contained herein or for interpretations thereof. Reference herein to any specific product or vendor by trade name, trademark or otherwise does not constitute or imply its endorsement, recommendation or favoring by the authors or contributors and shall not be used for advertising or product endorsement purposes. The opinions expressed herein are subject to change without notice.!!
Altimeter Group provides research and advisory for companies challenged by business disruptions, enabling them to pursue new opportunities and business models. !
Susan [email protected]!susanetlinger.com!@setlinger!