Date post: | 17-Aug-2015 |
Category: |
Documents |
Upload: | socialmediadaymi |
View: | 27 times |
Download: | 0 times |
Luigi Curini - @Curini
VOICES from the Blogs & University of Milan
#Spoletta @ Mashable Social Media Day 30-Jun-2015
Separating the wheat from the
chaff: Signal, Noise & Other
stories in the Big (Data) World
http://voicesfromtheblogs.com
http://voicesfromtheblogs.com
Testo
They are:๏ Big (in volume)๏ Many (per unit of time)๏ Unstructured (messy, not ready to be processed)
Big (or organic) data
Sources:๏ Administrative repositories๏ Transaction data๏ Social media & Social Network
http://voicesfromtheblogs.com
Testo
Do they work at all?
The blue side of Big data
http://voicesfromtheblogs.com
Testo
Big data “believers”
Wired Magazine (2008): “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”.
http://voicesfromtheblogs.com
Testo
Big data “detractors”
Financial Times (2014): “Big Data: Are We Making a Big Mistake?”
?
http://voicesfromtheblogs.com
Testo
Time (2014): “Google’s Flu Project Shows the Failing of Big Data”
Big Data: the Big Fail?
“BIG Data” are today’s “data”
answer:
“there are Big Data & small data scientists”
change the data scientist, not the data!
a good advise:
http://voicesfromtheblogs.com
Let us focuson big data
coming fromSocial Media
http://voicesfromtheblogs.com
geo-localized data
retrospective analysis (capture opinions when they are
expressed)
real-time analysis (continuous monitoring and/or alerting)
speed of data analysis (if you know how to do it)
gathering of unsolicited opinions
census-type analysis: analyze the entire population of
texts not just a sample
population on social media not necessarily
representative of demographic population
can’t ask questions, just listen to people: if people
don’t discuss about a topic you don’t have the data
textual analysis, language evolves continuously
and changes according to topic, media, etc.
pros
cons
http://voicesfromtheblogs.com
Three simple ideas
“Romance should never begin with sentiment. It should begin with science and end with a settlement.”
Oscar Wilde, An Ideal Husband
http://voicesfromtheblogs.com
NO: Mentions, Likes or Retweet. Computers are good at this, but humans can do better!
How to analyze Social Media data (1)
Obama 16.8M of followers
Romney 0.6M of followers
Final result: Obama +4.0% !
http://voicesfromtheblogs.com
NO: ontological dictionaries, nor NLP rules
How to analyze Social Media data (2)
Testo “This movie has good premises. Looks like it has a nice plot, an
exceptional cast, first class actors and Stallone gives his best. But it
sucks”
"Ibis redibis numquam peribis in bello", can be translated as “will go,
will come back, will not die in war", but also the opposite way, “will
go, will not come back, will die in war"
“ragazza stufa scappa di casa… i genitori muoiono di freddo”
“There is no favorable wind for the mariner who doesn’t know where to go” (Seneca)
http://voicesfromtheblogs.com
NO: ontological dictionaries, nor NLP rules
How to analyze Social Media data (2)
Look at the data Look into the data
http://voicesfromtheblogs.com
Switch to Supervised Techiniques!
The advantages of human beings…
• Always in sync with linguistic expressions
[dictionaries are static]
• Completely language-independent
• Moreover….
http://voicesfromtheblogs.com
Beyond sentiment…
there is more information out there!!!
http://voicesfromtheblogs.com
Opinions, reasons, attitudes, tones…
see the colours!
NO: individual classification and later aggregation. Estimate directly the aggregated distribution of opinions!
How to analyze Social Media data (3)
We don’t care about the needle in the haystack...
...we care about the haystack! (G. King)
http://voicesfromtheblogs.com
http://voicesfromtheblogs.com
What people don’t say if asked
but discuss on social media!
http://voicesfromtheblogs.com
Short Memory Effect!Positive support October 2014 7 & 8 January Next week
World 21,2% 18,1% 21,9%
Europe 21,9% 17,5% 20%
France 20,8% 3% 17%
See the large picture: the Moncler case study
www.voicesfromtheblogs.com | we capture the sentiment of the net
Monday Nov. 3rd 2014. The day after the TV Show Report sent on air a negative reportage on the Moncler company. Mentions online (among Twitter, Facebook, Instagram, blog, forum and other social channels) for the brand raised of about 450% compared to the average level.
That peak corresponded to a 22% fall in social brand reputation in just 24 hours (from a positive sentiment of 75% to a negative of 53%, and 43% on Twitter alone).
The assets on the stock exchange felt as well by 5%.
Was this due to the Social Media?
www.voicesfromtheblogs.com | we capture the sentiment of the net
Obviously not, the negative trend was totally predictable and independent of the SM sentiment
See the large picture: the Moncler case study
MIXING OFFICIAL STATISTICS AND SOCIAL
MEDIA DATA:!
NOWCASTING
http://voicesfromtheblogs.com
Wired Next Index
http://voicesfromtheblogs.com
Official Statistics: “cold data”, backward looking, low frequency, slow (GDP, import/export, n. companies, labour force statistics, ecc) !
Surveys: “cold data”, slow, forward looking (e.g. consumer or entrepreneur expectations’, ecc) !
SM Sentiment: “hot data”, nowcasting, forward looking
http://voicesfromtheblogs.com
self-!expectation!
(VOICES)
consumers!expectation!
(Istat)
entrepreneurs!expectation!
(Istat)
expectation!in the country!
(VOICES)
-15 days
-10 days
-18 days
Period: 1 January - 31 March 2014
Cold!indicators
Hot!indicators
nowcasting !=!
anticipation
VOICES from the Blogs born in October 2010 as a scientific
project to capture opinions expressed on the Web (social
media, blogs, forums, web)
On 12/12/12 VOICES became a Spin-off of the University of
Milan – Italy; and started operations as an independent
company
Up to January 2015 VOICES has analyzed more than half
billion of posts written in Italian, English, French, Spanish,
German, Russian, Arabic, Portuguese, Chinese and
Japanese
In December 2014 VOICES is among the winners of the
contest “Produrre Statistica ufficiale con i Big Data”
promoted by &
About us
www.voicesfromtheblogs.com | we look into the data, not at the data
Since March 2015 SWG has become a
partner of VOICES
Thanks to this partnership, the first
integrated group in data science and
business intelligence has born in Italy
About us
www.voicesfromtheblogs.com | we look into the data, not at the data
But remember…
Big Data is likely to contribute so long as the desired qualities of the data ar
not negatively correlated with the quantity of data
In a nutshell…
Method DO MATTER!
http://voicesfromtheblogs.com
Thx !
For more information, analyses and
white papers about the project visit us at
http://voicesfromtheblogs.com
On Twitter: @blogsvoices
http://voicesfromtheblogs.com