Date post: | 25-Jan-2015 |
Category: |
Education |
Upload: | claude-g-theoret |
View: | 68 times |
Download: | 2 times |
The opportunity for Social Data Scientists
@cgtheoret
Part 1 The Explosion
@cgtheoret
@cgtheoret
Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube• 320 new accounts and 98,000 tweets appear
on Twitter• 168,000,000 million emails are sent • 20,000 new posts on Tumblr• 6,600 photos appear on Flickr• Over 20% of all websites are
CMS/wordpress/etc…
Every minute today:
• 100 hours of video are downloaded on Youtube
• ??? new accounts and 236,000 tweets appear on Twitter
• 204,000,000 million emails are sent • 28,000 new posts on Tumblr• 1,600 photos appear on Flickr !!! No shit!
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
But…• Facebook has lost 1.5 million users in Canada
and 6 million in the United States • Yahoo study: 50% of the content that is read
and shared by humans is produced by only 20, 000 accounts 0.05%
@cgtheoret
@cgtheoret
Gartner is predicting an explosion in Social Media Analytics It spending
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways Social “Big Data” is like Oil…• Difficult and expensive to extract
@cgtheoret
Difficult and expensive to extract
@cgtheoret
Difficult and expensive to store and distribute
Cheapest (and least useful) when its unrefined
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…• Can’t be used by consumers unless refined• More expensive at every step of refinement
@cgtheoret
The Market is Producing a plethora of derived higher value data products
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract• Difficult and expensive to store and distribute• Cheapest in its unrefined form• More expensive at every step of refinement• Produces a plethora of derived products• and it’s actually quite “dirty”!!!!
@cgtheoret
Part 2
Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition
VERACITY
@cgtheoret
Social Data Analytics = Oil Refineries
@cgtheoret
@cgtheoret
6 factors affect Data Veracity …
1. Accuracy: Is it true?2. Precision: If true, error margin?3. Reliability: Is it there all the time?4. Provenance: Can you trace the source?5. Fidelity: Did it change from the
source?6. Permission: Can you use it for the
context?
Black Hat SEO : Blogs
Twitter: 46% of brand followers are bots
Black Hat Social Marketing : Twitter
Or in some cases over 90 %…
Dissapearing Romney: FB as well…
And it is getting worse …
Trying to solve the Veracity problem …
Trying to solve the Veracity problem …
The Big Guys are now doing Veracity …
Murali Krishnam <[email protected]>Murali Krishnam <[email protected]>
@cgtheoret
Part 3The Opportunity for Social Data Scientists
@cgtheoret
@cgtheoret
“McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”
@cgtheoret @fffady
Zeitgeist
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady