+ All Categories
Home > Documents > PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A...

PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A...

Date post: 19-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
46
PREDICTING REVENUE WITH SOCIAL MEDIA DATA: A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGESWord count: 8913 Eline Debakker Student number : 01205061 Supervisor: Prof. dr. Dirk Van den Poel Co-supervisor: Steven Hoornaert Master’s Dissertation submitted to obtain the degree of : Master of Science in Business Engineering Academic year: 2016 - 2017
Transcript
Page 1: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

PREDICTING REVENUE WITH SOCIAL

MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK

FANPAGES”

Word count: 8913

Eline Debakker Student number : 01205061

Supervisor: Prof. dr. Dirk Van den Poel

Co-supervisor: Steven Hoornaert

Master’s Dissertation submitted to obtain the degree of:

Master of Science in Business Engineering

Academic year: 2016 - 2017

Page 2: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

1 Confidentiality Agreement

I declare that the content of this Master’s Dissertation may not be consulted and/or reproduced.

Eline Debakker

I

Page 3: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

2 Abstract

Abstract - Dutch

In deze paper wordt de activiteit op Facebook en Twitter fanpagina’s vergeleken met betrekking tot hun

relatie tot de omzet. Hiervoor werd de kwartaalomzet van verschillende Fortune 500 bedrijven gebruikt

tussen 2014 en 2016. Verschillende parameters van activiteit op Facebook en Twitter fanpagina’s worden

samen en apart vergeleken. De resultaten tonen dat er een verband is tussen kwartaalomzet en de hoeveelheid

tweets verstuurd door fans. Gebaseerd op de resultaten van deze studie, kan geen verband tussen Facebook

fanpagina activiteit en omzet aangetoond worden.

Abstract- English

This paper studies the relationship between Twitter and Facebook fanpage activity and quarterly revenue for

Fortune 500 companies between 2014 and 2016. A comparison is made to study these effects both separately

and together. Results indicate that shocks in the sentiment of fans on Twitter influence a brand’s quarterly

revenue. We found no significant evidence of a relationship between Facebook fanpage activity and quarterly

revenue.

Keywords: Fortune 500, sentiment, WoM, VAR-analysis, social network sites

II

Page 4: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

3 Foreword

In this foreword, I would like thank a few people for helping me to write this thesis.

I wish to thank Prof. Dr. Dirk Van den Poel, for giving me the chance to write this master’s dissertation.

Mr. Steven Hoornaert, for his helpful advice and coaching in the last two years.

Finally, I would like to thank my mother, my sisters, family and friends for supporting me during my

studies.

III

Page 5: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Contents

1 Confidentiality Agreement I

2 Abstract II

3 Foreword III

1 Introduction 1

2 Literature 2

2.1 Online Brand Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Facebook, Twitter and their impact on business . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Data & Methodology 7

3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 Operating revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.2 Social media data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.2 Volume-related variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 The VAR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Results and Discussion 16

4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Facebook & Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.4 Impulse response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.4 Impulse Response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4 Facebook model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4.1 Tests for stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4.2 Granger-Causality tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4.4 Impulse Response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Limitations and future research 36

IV

Page 6: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

List of Tables

1 Companies included in dataset per sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Variables and description of panel dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Example of Valence shifters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Overall descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Summary statistics (1/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Summary statistics (2/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8 Augmented ADF-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

9 Granger causalities tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

10 LM-test for residual autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

11 Descriptive statistics of Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

12 Correlation analysis Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

13 Granger-causality tests for Twitter variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

14 Lag selection criteria for Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

15 LM-test for residual autocorrelation in the Twitter model . . . . . . . . . . . . . . . . . . . . 27

16 Descriptive statistics of Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

17 Correlation analysis of Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

18 Augmented ADF-tests for Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

19 Granger-causality tests for Facebook variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

20 Lag selection criteria for the Facebook model . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

21 LM-test for residual autocorrelation in the Facebook model . . . . . . . . . . . . . . . . . . . 31

List of Figures

1 Quarterly revenues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Estimation output of the VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Impulse response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Impulse response values (Revenue and Posts) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Impulse response values (Tweets and Facebook likes) . . . . . . . . . . . . . . . . . . . . . . . 25

6 Estimation output of the Twitter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7 Impulse response functions of Twitter the model . . . . . . . . . . . . . . . . . . . . . . . . . 29

8 Impulse response values (Revenue and Twitter sentiment) . . . . . . . . . . . . . . . . . . . . 29

9 Estimation output of Facebook VAR(2) model . . . . . . . . . . . . . . . . . . . . . . . . . . 32

10 Impulse response functions of the Facebook model . . . . . . . . . . . . . . . . . . . . . . . . 33

11 Impulse response values (Posts and Facebook sentiment) . . . . . . . . . . . . . . . . . . . . . 34

12 Impulse response values (Likes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

V

Page 7: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

1 Introduction

As customers are becoming more resistant towards traditional marketing programs, companies have turned

to social network sites to engage and communicate with their customers (Bagozzi and Dholakia, 2006). The

existence of social network sites (SNS) such as Facebook and Twitter have transformed customers from

passive receivers of content spread by a brand to active co-creators and broadcasters of brand messages.

Brand lovers got together on online brand communities, whether or not organized by a company. Online

brand communities are a form of consumer communities and are important for driving purchases. In this

aspect, content created by brand community members - UGC or user-generated content - is key. UGC is

more influential than content created by the brand. Social media activities influence the consumer buying

process (Saboo et al., 2016). Engagement in social media brand communities significantly increases consumer

purchases (Ping et al., 2012). Negative UGC caused by negative events such as a competitor’s product recall

also has an effect on a company’s sales (Borah and Tellis, 2015). Companies benefit from tracking and en-

couraging customer engagement behavior in communities, wherefore it not solely leads to more commenting

and liking, but also to more buying (Johanna Gummerus et al., 2012).

Brand communities grant companies access to a great number of consumers, at low costs, high speed and

ease of applicability. They are efficient in customer-to-customer based information exchange and learning

(Zaglia, 2013). Indeed, customers often add value by generating content and even become ardent advocates

for the seller’s products and can influence purchase decisions of others in peer-to-peer interactions (Sashi,

2012).

On Facebook, brand communities were created on the brand’s fanpage. On Twitter, brands typically re-

volve around news and message sharing. Consumers follow brands or brand advocates on Twitter for social

interactions, brand usage and information seeking reasons. These motivations are significant predictors for

consumer brand relationship variables such as brand community commitment (Kwon et al., 2014). Policies

on Twitter & Facebook state that only a person who is an authorized representative of a brand can create

and administer a brand’s fanpage. However, apart from official hosted fanpages by the company, there exists

an abundance of unofficial fanpages hosted by consumers (Facebook; Twitter).

Research has investigated the link between social media and real-life outcomes. In the last decade, a growing

body of research has used social media data for forecasting models, predicting box office revenues for movies

(Asur and Huberman, 2013), quarterly sales of Nike apparel (Boldt et al., 2016) and Apple’s iPhone (Lassen

et al., 2014). Most studies use information from one social platform, hereby ignoring the wider social ecol-

ogy of interaction and diffusion between multiple social network sites (Tufekci, 2014). Indeed, online social

media is part of an ecosystem where traditional and social media work together to engage customers and

promote products or services (Hanna et al., 2011). To our knowledge, only three studies used social big data

from more than one platform. Saboo used social media metrics from Myspace, Facebook, Twitter, Last.FM

and Youtube (Saboo et al., 2016), Boldt combined data from Facebook pages with online search volume via

Google Trends and Google Shopping (Boldt et al., 2016), while Oh studied the impact of volume-related

metrics of Twitter, Facebook and YouTube on revenue was studied (Oh et al., 2017).

1

Page 8: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Despite the acknowledgement of social media as a medium to drive sales and a growing body of research

investigating the matter, there is little understanding of how behavior of consumers influence sales across

social media platforms. After all, by using only one single social media platform, the wider social ecology

of interaction and diffusion is overlooked (Tufekci, 2014). This study investigates which social variables are

linked to revenue across multiple platforms, more specifically Facebook and Twitter. A comparison of the

impact of metrics on revenue is made. As 86% of Fortune 500 is present on Twitter and 84% on Facebook

(Statista, 2017), we will focus our study on the fanpages of these companies. As far as we know, only Apple

(#3 in 2016) and Nike (#91 in 2016) have been studied before in this aspect. Research, using data from

multiple companies, has not been done yet. A general insight into the parameters that influence revenue

on which platform, will make it easier for companies to monitor their social media activity. In addition,

knowing which parameters influence revenue could help companies to encourage fans to take certain actions

on Facebook and Twitter. This could lead to an increase in revenue. In this way, both companies and fans

will benefit from companies’ efforts to leverage their activity on these social media platforms. This study

hopes to provide guidelines on how to leverage social media that is not specific for one company. In this

aspect, our research is added to existing literature as it compares the impact of social media on business

outcomes of multiple companies across multiple platforms.

The remainder of this paper is structured as follows. Next session will give a review of related litera-

ture and relevant social metrics used in our study. Then, we discuss the data and methodology to build

multiple models. In section 4, the results are described and conclusions are made. Finally, we mention

limitations and give suggestions for future research.

2 Literature

2.1 Online Brand Communities

A brand community is a specialized, non-geographically bound community, based on a structured set of

social relationships among admirers of a brand (Muniz and O’Guinn, 2001). The community exists of a

collective of people with a shared interest in a specific brand, creating a subculture around the brand with

its own values, myths, hierarchy, rituals and vocabulary (Cova and Pace, 2006). Compared to traditional

communities, it takes less time to bond with customers (de Valck et al., 2009), the costs are lower and the

efficiency level of engagement is higher (Kaplan and Haenlein, 2010; Albert et al., 2008). By exchanging

information, experiences, knowledge or simply love for the brand, members can influence other customers’

relationship with and attitude towards the brand (McAlexander et al., 2002).

2

Page 9: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Over the last few years, brand communities emerged on social media. On Facebook, people can become

”fan” of a brand on their fanpage. Characteristics of a fan are self-identification as a fan, emotional en-

gagement, cultural competence, auxiliary consumption and co-production (Kozinets, 2008). In this regard,

a fanpage, hosted by a company or a consumer, with its active members can be seen as a brand community.

Zaglia showed that fanpages indeed showed brand community characteristics; perceived membership, social

identity, higher moral responsibility and fulfillment of their need of information (Muniz and O’Guinn, 2001;

Zaglia, 2013).

A fanpage also serves as a platform to communicate concerns and suggestions and to receive social enhance-

ment (Zaglia, 2013). On Twitter, brand followers do not perceive the brand page as a traditional brand

community (Kim et al., 2014). Brand lovers on Twitter are little motivated to engage with other users

(Kwon et al., 2014)). The community is not bound by close ties between users. Unlike Facebook, Twitter

was not designed to support the development of communities. Nevertheless, brand fans would still share a

sense of community with the brand and other brand admirers within an ”imagined community” comprised

of sets of interlinked ”personal communities” (Tiryakian et al., 2011). Communities in a network on Twitter

can be defined as a group of nodes, representing Twitter users, more densely connected to each other than to

nodes outside the group. These communities are often topical or based on shared interest, such as a brand

(Java et al., 2007).

Marketeers recognized the importance of online communities as a way to enhance brand advocacy and to

understand the tastes and decision-making influences of consumer groups (Kozinets, 2008). For a company,

the added value of brand community members is greater than the one those of other customers because of

four dynamics (Joseph P. Cothrel, 2000). A buy-sell dynamic which provides members with information

they need to buy products. Referral dynamics indicate that community members, hyper affiliates, tend to

spread word about products and companies they love. A self-select dynamic makes it easier for sellers to send

relevant offers to targets, because members already signal their interest by joining the community. Lastly,

the repeat visit dynamic states that community members are repeat visitors, who will buy more and more

often, since the seller has more opportunities to reach them. Empirical evidence showed that membership

in a social media brand community has an impact on purchase behavior. Community members visit a store

more frequently than non-members. They also spend more, which shows that attitudinal loyalty to the

brand’s community translates to behavioral loyalty (Ping et al., 2012). More participation in generating

content and interaction is what makes a brand community sustainable and profitable, because the level of

participation influences consumer intention (Ding et al., 2017).

Customers often seek advice on their social networks. This advice is perceived as highly valuable and

is later used to make buying decisions or to learn (Zaglia, 2013; Wu et al., 2010). Bagozzi defined these

motivations as informational and instrumental benefits, distinguishing them from entertainment benefits

such as fun and relaxation (Bagozzi and Dholakia, 2006). Social benefits include recognition and friendship.

Economic benefits occur when members can receive discounts or participate in competitions (Gwinner et al.,

1998). The commitment of members towards the community is influenced by community interactions and

3

Page 10: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

rewards for their activities, which in turn influences loyalty towards a brand (Jang et al., 2008).

2.2 Facebook, Twitter and their impact on business

Besides the impact of social media on people’s lives, it has also influenced corporations and their marketing

strategies. Companies realized the need to be active on social media and nowadays they spend an increasing

amount of time determining a social media strategy. A company’s presence on social media can affect brand

loyalty and credibility, which in turn influences customer engagement and sales (Edosomwan et al., 2011).

However, consumers are no longer just passive receivers in a company’s marketing program. Instead, they

have become active creators of content. Individuals looking for information about a brand’s product are

exposed to a huge amount of content that is beyond that companies control. Social media platforms provide

the tools necessary for individuals to spread their own opinions and experiences, thereby influencing others

in the network. A growing number of social media users consult their social community before making

purchase decisions. The consumer decision process is impacted by social referrals from friends and strangers

alike (Barnes, 2014). Companies tap into this dynamic by providing functions on their websites that facilitate

this social sharing. Using Facebook‘s Like-function or Twitters tweets, consumers are encouraged to share

commercial information with their social network. This is called social commerce, a type of e-commerce in

which user-ratings, online communities and social advertising are used to facilitate online shopping (Liang

et al., 2011). Companies can have more insight into the interests of their customers, using social media big

data. There is also an increasing amount of research investigating the relationship between social media

and real-world outcomes. Kalampokis reviewed 52 empirical studies regarding the use of social media for

different business outcomes to create a social media analysis framework (Evangelos Kalampokis et al., 2013).

Our study focuses on the link between Facebook and Twitter big data and revenue and sales. Prior research

already revealed the existence of the relationship between these platforms and business outcomes. We will

discuss these studies, focusing on the social media metrics that have already been linked to revenue.

While early researches used blogs and online review platforms such as those on Amazon.com, more

research nowadays uses data from social network sites, particularly Twitter. Using social network sites as

a way to capture consumer behavior can be justified by the existence of online brand communities, where

like-minded people with similar interests are gathered. Online communities allow users to express their

personal preferences, to share recommendations and to identify trusting members. Consumers are more

likely to believe recommendations from people they trust (Wu et al., 2010). This could be friends and

family, but also fellow community members. Walsh argues that social network sites such as Facebook are

increasingly driving traffic to retail sites, indicating that communities might often be consulted before the

purchase decision is made (Walsh, 2006). The activity of Facebook fanpage members or followers of a Twitter

page can be measured using to types of variables, volume-related variables and sentiment variables. While

the former refers to the volume of chatter and other social media activity, the latter focuses on what is

said (DeMarco, 2013). Both measurements are a way to describe Word-of-Mouth. WoM is the process of

conveying information from person to person (Jansen, 2009). It involves sharing opinions or reactions about

4

Page 11: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

businesses or products. In the online form, this electronic WoM is considered as trusted information, spread

by both friends and strangers in the immediate social network (Barnes, 2014).

This study compares the link between revenue and social media metrics of Twitter and Facebook. The

effect of each of these variables on revenue and vice versa should be studied simultaneously (Duan et al.,

2008; Rui et al., 2013; Ding et al., 2017; Oh et al., 2017). Moreover, most of the companies studied in

this study are active on both Twitter and Facebook, which means the information can flow between the

two platforms (?). Tufecki and Wilson found that social media users alternate between Facebook, Twitter,

broadcast media, cell-phone conversations, texting, face-to-face and other methods to interact and share

information (Tufekci, 2014). Behavior of fans on one platform might thus influence outcomes on the other.

Hypothesis 1: Twitter social media metrics have an influence on Facebook social media metrics.

Hypothesis 2: Facebook social media metrics have an influence on Twitter social media metrics.

As social media metrics, Facebook and Twitter sentiment, the volume of fanpage posts, likes and tweets will

be studied.

The content of tweets, posts and comments is analyzed to find opinions, thoughts or feelings (i.e. sen-

timent) which are then given a score based on whether these feelings are positive or negative. Sentiment

variables reveal sentiment of customers captured in online reviews, product ratings, blogposts and tweets

(Gruhl et al., 2005; Mishne and Glance, 2006; Chevalier and Mayzlin, 2006; Dhar and Chang, 2009; Onishi

and Manchanda”; Chern et al., 2015; Lassen et al., 2014; Archak et al., 2011; Chern et al., 2015; Asur and

Huberman, 2013; Chen et al., 2015; Zhu and Zhang, 2010). Harnessing this sentiment and text mining could

significantly improve traditional prediction methods (Zhi-Ping Fan, 2017). Sentiment analytics analyzes

words in a text, making a distinction between positive (+1), negative (-1) and neutral (0) words. Based on

these polarized words, each sentence is given a total sentiment score, which can explain revenue performance.

A weakness of sentiment analysis of social media is the noisy and informal nature of social media, causing

predictors to perform poorly (Ding et al., 2017).

The effect of social media Word-of-Mouth (WoM) on revenue has been studied several times. A significant

effect of positive and negative tweets on box office revenue has been found, which suggests that it is impor-

tant for companies to monitor the sentiment of tweets (Rui et al., 2013). Consumers looking for product

information who come across more positive WoM, will evaluate the product more positive. This will influence

their buying decision. If this reasoning is correct, an increase in sentiment (WoM) will indeed lead to higher

revenues. We will test if this relationship exists for the Fortune 500 companies included in our dataset.

Specifically,

Hypothesis 3: Twitter sentiment has an influence on quarterly revenue.

Hypothesis 4: Facebook sentiment has an influence on quarterly revenue.

Hypothesis 5: Quarterly revenue influences Twitter sentiment.

Hypothesis 6: Quarterly revenue influences Facebook sentiment.

Volume-related data such as the count of daily mentions, number of MySpace friends, tweet rate, number

5

Page 12: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

of tweets and retweets, Facebook posts and likes and other social media metrics have been linked to revenue

(Gruhl et al., 2005; Dhar and Chang, 2009; Lassen et al., 2014; Asur and Huberman, 2013; Saboo et al.,

2016; Boldt et al., 2016). There is is correlation between the amount of attention a topic receives and its

success in the future. Asur & Huberman proved that the volume of tweets can serve as a proxy for a user’s

attention towards a product. (Asur and Huberman, 2013). Boldt used different Facebook variables such as

the number of likes and users, assuming that activity on fanpages can serve as a proxy for collective opinions

and attention (Boldt et al., 2016). Drawing on the AIDA framework, this attention can lead to purchases

(Lassen et al., 2014). Although Facebook is currently the largest social media platform, few studies have

used its datasets as a resource for studying real-world outcomes. Compared to Twitter, privacy settings

on Facebook make a lot of profiles inaccessible to the public internet (more than 50% on Facebook versus

less than 10% on Twitter) (Tufekci, 2014). Our analysis also faced this issue. Brand community markers

are strongest on closed Facebook Groups that are often private, making its data inaccessible. Luckily for

our analysis, Facebook fanpages are public and show brand community characteristics, making it an ac-

ceptable substitute. In her paper, Tufekci advocates for more multi-platform analysis. Differences in the

structure of Twitter and Facebook data and of the population using these platforms make it interesting to

compare them in a forecasting model. Nevertheless, a recent study found that Facebook data can be used

for forecasting sales of Nike products (Boldt et al., 2016). Nearly 4 in 10 Facebook users report that their

behavior on Facebook such as liking, sharing or commenting on an item has already led to them buying it

(Barnes, 2014). Facebook fans spend nearly five times the amount on products of their favorite brand in

comparison with non-fans (Hollis, 2011). Oh studied the effects of Facebook likes, posts and comments on

revenue (Oh et al., 2017). Indeed, the Facebook like has a positive impact on revenue, and vice versa (Ding

et al., 2017). Similarly, the relationship between revenue and Twitter has been studied using the Twitter

variables. Research shows the volume of tweets has a positive effect on revenue, as well as the number of

followers(Rui et al., 2013; Oh et al., 2017). Online Word-of-Mouth on blogs influences revenue, but increases

in revenue are also positively influencing online WoM volume. The influence of this effect is short, which

can be explained by the fast, real-time nature of Twitter. Additionally, Duan proved that the sentiment of

WoM also influences the volume (Duan et al., 2008).

WoM influences revenue in two ways. On the one hand, the sentiment influences how a consumer evaluates a

product and ultimately also his purchase decision. On the other hand, the volume of WoM increases aware-

ness. Similar to the volume of likes, this will increase revenue, as shown by the social impact theory (Latan,

1981). The same theory states that the immediacy and tie strength between individuals will determine how

much an individual is influenced. In communities, where ties are strong, this could lead to a significant

increase in purchase decisions. Based on previous research we expect there is a significant relationship and

revenues for Fortune 500 companies as well.

Hypothesis 7: The volume of tweets influences quarterly revenue.

Hypothesis 8: The volume of volume-related metrics on Facebook have an impact on quarterly revenue.

Hypothesis 9: Quarterly revenue influences the volume of tweets.

Hypothesis 10: Quarterly revenue influences the volume-related metrics on Facebook.

6

Page 13: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

3 Data & Methodology

3.1 Data

The empirical data utilized in this study was retrieved from online data sources, Facebook and Twitter. Quar-

terly operating revenue from Fortune 500 companies was retrieved from the online data source Knoema.com.

Social media data was retrieved from Facebook and Twitter. Posts and comments were retrieved from

fanpages to capture customer sentiment, as well as tweets and comments on Twitter. Fanpages created by

Fortune 500 companies and their fans were included.

3.1.1 Operating revenue

Quarterly operating revenues were collected on the online data search engine Knoema.com. For correctness,

the list of companies was compared with the official list on the Fortune 500 website (Fortune500, 2016).

The quarters are annual quarters, starting on January 1st 2014 and ending at the end of the third quarter

of 2016, September 30th. Operating revenue is revenue generated from a company’s day-to-day business

activity. It comprises the revenue generated from selling the company’s products or services. In contrast,

the term revenue refers to the proceeds from sales and other sources of income.

3.1.2 Social media data

Both Fortune 500 companies and their fans can host fanpages on Facebook. Social data, posts and comments

were extracted from user- and company-created fanpages. A Facebook fanpage is noted as (“@ + page

name”). Any user is allowed to create a page, expressing interest in a brand. Users are asked to make clear

that this page is not the official fanpage, by indicating so in the name, info or bio. Companies have the

right to file a trademark report about any account that violates these policies and have it removed, but since

it is so easy to create new fanpages, it is impossible to keep track of them all. Nowadays, as long as the

page does not speak in voice of the brand, these user-created fanpages are often allowed to continue their

activities. In order to capture the behavior of fans of the Fortune 500 companies as best as possible, we use

both company and fan-hosted fanpages.

Different pages generate different levels of interaction, indicated by the number of page likes of a fanpage.

For instance, “@fansofapple”, a fan-hosted fanpage with 868.838 page likes on May 14th 2017, states that

“Fans of Apple is an independent Facebook fan page and is not in ANY way, shape or form affiliated with

Apple, Inc.” Companies can also have fanpages apart from their official page. These pages are specifically

intended for community purposes, such as the “@RNspire”, a nurse appreciation page monitored by Cardinal

7

Page 14: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 1: Quarterly revenues

Health, dedicated to informing, inspiring and recognizing the community.

On Twitter, tweets and replies were scraped off of Twitter pages created by fans and companies. Unlike

Facebook, there is no specific construct for a fanpage. A “fanpage”, created with the purpose of informing,

entertaining and conversing with fellow brand lovers, is simply a Twitter account with in its name or bio a

mention of it being a fanpage. Again, some companies had an abundance of Twitter fan pages. For each

company, the top three accounts with the largest number of followers were selected. For example, ”@CultOf-

Mac: a site that follows everything Apple. News, reviews and how-tos”, had 783.362 followers and was the

largest Apple-related fanpage on Twitter on May 14th 2017.

Keeping in mind later analysis, only English pages were used. Pages created with the purpose of spreading

coupons, supporting the brand’s sports team, as well as hate and charity pages were excluded. Company-

hosted fanpages, created for divisional, internal or specific product lines were also excluded, so that only

pages for people with a shared interest in the brand remained, in accordance with the definition of a brand

community. Fanpages were found for only 178 of the Fortune 500 companies. Those pages could be created

by a fan, the company itself or both. However, we were not able to extract data from all pages. We omitted

pages that did not have any posts or activity between 2014 and 2016, as well as pages that were not created as

a Facebook page (but as a Group or Person). Eventually, we were left with Facebook data for 33 companies.

In a similar way, we extracted Twitter data for 52 of the 74 companies for which we found at least one

Twitter fanpage. An overview of the number of Fortune 500 companies included in the dataset per industry

is provided in table 1

8

Page 15: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

IndustryCompanies in dataset withFacebook fanpage(s)

Companies in dataset withTwitter fanpage(s)

Aerospace & Defense Boeing

Lockheed MartinNorthrop GrummanTextronUnited Technologies

Apparel Nike Nike

Business Services -

AramarkMasterCardVisaWestern Union

Energy NextEra Energy

Centerpoint EnergyChevronConoco PhillipsDevon EnergyExxon Mobil

Financial Services - Berkshire Hathaway

Food & DrugstoresKrogerRiteAid

CVS CaremarkSupervaluWalgreensWhole Foods Market

Food & TobaccoArcher Daniels MidlandDr Pepper Snapple Group

Coca-cola

Healthcare Cardinal Health Cardinal HealthHotels & Leisure Marriott International -

Household productsAvon ProductsKimberly-Clark

Procter & Gamble

Industrials CaterpillarJohn DeereWhirlpool

Media Walt DisneyDiscovery CommunicationsLive Nation EntertainmentWalt Disney

Motors Ford MotorsFord MotorsGeneral MotorsGoodyear Rubber&Tire

Retailing

CostcoHome DepotNetflixRoss Stores

Netflix

Technology

Alphabet GoogleAmazonAppleCisco SystemsOracleSalesforce

AlphabetAppleCisco SystemsIBMIntelMicrosoftNetappOracleSalesforce

TelecommunicationsAT&TComcastWindstream

AT&TComcastWindstream

Transportation

American AirlinesCSXJetBlue AirwaysNorfolk SouthernUnion Pacific

American AirlinesDelta AirlinesJetBlue Airways

Table 1: Companies included in dataset per sector

9

Page 16: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

We selected a dataset of 14 companies for which we have found fanpages on both Facebook and Twitter:

Apple, Ford Motors, AT&T, Kroger, Nike, Walt Disney, Jetblue Airways, Cisco Systems, Oracle, Com-

cast, Windstream, Cardinal Health, Salesforce.com and Alphabet Google. The evolution of their quarterly

revenues is given in 1

Variable Description

Date Year & QuarterCompany Index of the companyRevenue Quarterly revenue per company (million dollars)

Posts Volume of Facebook postsTweets Volume of tweetsTwsent Rate of positive tweetsFbsent Rate of positive posts & commentsLikes Volume of likes

Table 2: Variables and description of panel dataset

3.2 Variables

An overview of the variables used in the study is given in table 2. The data is structured as an unbalanced

panel. The time variable is Date, ranging from 2014Q1 to 2016Q3, the cross-sectional variable is Company.

3.2.1 Sentiment Analysis

In order to determine the Twitter and Facebook sentiment, a sentiment analysis was performed using senti-

mentr package (Rinker, 2017). Tweets were used for Twitter. For Facebook, we used posts and comments.

Each message was preprocessed using the tm-package in following ways:

• Elimination of URLs and hashtags, mentions of other account names (”tagging someone”)

• Elimination of white space, and other special characters

• Elimination of numbers

• Transformation of content to lower case

Then, each message was given a polarity score. The sentiment function in sentimentr calculates text

polarity sentiment for each sentence in a text by combining standard dictionary look-up for polarized words

with weighting for valence shifters. The words in each post or tweet were compared with polarized words

in the sentiment dictionary. We used the Jockers dictionary in the lexicon-package. We tried to calculate

sentiment as best as possible, taking into account the limited length of tweets and Facebook posts. While

simple dictionary methods often perform well when determining the overall sentiment of a long text, they

would be less appropriate to determine the sentiment of one tweet, since it does not take into account a

lot of words that can shift or reverse the sentiment of a polarized word. These words are called ”valence

shifters” (i.e. negators, amplifiers, de-amplifiers and adversative conjunctions). In table 3, examples were

obtained from the @CultOfMac fanpage, marking their valence shifters. Table 3 shows that valence shifters

occur often in social media context and influence the word they are associated with (polarized word). For

10

Page 17: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Valence shifter Tweet example

NegatorI do not expect, nor desire,software making such edits for me. Not OK

Amplifier I really hope they don’t do that

De-amplifierHardly seems like news, or relevantto anything

Adversative ConjunctionI use Bluetooth. However, trying to finda headset with lightning connector.

Table 3: Example of Valence shifters

example in ’Trump tweets’, the co-occurrence of valence shifters with polarized words are 19% for negators,

18% for amplifiers, 4% for de-amplifiers and 4% for adversative conjunctions (Rinker, 2017). We preferred

the sentiment function over standard dictionary look-up methods, making sure these sentiment shifters are

accounted for.

To determine the polarity score, each sentence sj , in tweet or post ti = {s1, s2, ..., sn}, is broken into an

bag of words si, j = {w1, w2, ..., wn}. For example, w1,2,3 is the third word in the second sentence of the first

tweet. Except for commas and semicolons (pausewords cw are considered as a word within the sentence), all

punctuation is removed. The algorithm first compares each word wi,j,k to a dictionary of polarized words.

Each positive (w+i,j,k) or negative (w−

i,j,k) word is tagged by +1 or -1 respectively. It forms a polar cluster

(ci,j,l ⊆ si,j) together with words taken from around the polarized word pw, so a cluster can be represented

as:

ci,j,l = {pwi,j,k − nb, ..., pwi,j,k − na}

with nb and na being respectively the number of words to be considered before and after the polarized word.

The upper and lower bound of the polarized word cluster are constrained by the pause words. In this way,

the algorithm takes into account changes of thought. The lower bound is constrained to max{pwi,j,k −

nb, 1,max{cwi,j,k < pwi,j,k}}, the upper bound is min{pwi,j,k + na,wi,jnmin{cwi,j,k > pwi,j,k}} with

{wi, jn} the number of words in the sentence j. In the context cluster each word is labeled as either neutral,

negator, amplifier or as de-amplifier. Based on the polarity_dt argument in the sentiment function, each

polarized word is given a weight w, which is then modified by the valence shifters surrounding the word.

• Neutral words (w0i,j,k) affect the word count, but have no influence on the polarity score.

• Amplifiers (wai,j,k) increase the polarity of a polarized word by 1.8. We used the default weight z = 0.8.

• De-amplifiers (wdi,j,k) reduce the polarity of a polarized word. Amplifiers become de-amplifiers if the

context cluster has an odd number of negators (wni,j,k).

• Negators (wni,j,k) flip the sign of a polarized word and act on amplifiers and de-amplifiers. Keeping in

mind that two negative words equal positive, 3 negatives a negative, the number of negative words is

used to determine the negation, raising -1 to the power of the number of negators (wni,j,k) plus 2.

• Adversative conjunctions before a polarized word up-weight a cluster by

1 + z2 ∗ {|wadv.conjunction|, ..., wpi,j,k} with the weight z2 equal to the default 0.85. If the adversative

conjunction occurs after the polarized word, it down-weights the cluster 1+{wpi,j,k, ..., |wadv.conjunction|∗

11

Page 18: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

−1} ∗ z2. The number of occurrences before and after the polarized words are respectively multiplied

by 1 or by -1. This value is multiplied by value z2 and then 1 is added.

Finally, the weighed context clusters ci,j,l are summed up to c′

i,j and this final value is then divided by

the square root of the word count to yield a polarity score per sentence

δ =c′

i,j√wi,j,n

with:

c′

i,j =∑

((wamp + wdeamp ∗ wpi,j,k ∗ (−1)2 + wneg)

wamp =∑

(wneg ∗ (z ∗ wi, j, ka))

wdeamp = max(wdeamp′ ,−1)

wdeamp′ =∑

(z(−wneg ∗ wai,j,k + wd

i,j,k))

wb = 1 + z2 ∗ wb′

wb′ =∑

(|wadv.conjunction|, ..., wi, j, kp, wi, j, kp, ..., |wadv.conjunction| ∗ −1)

wneg = mod2 ∗∑

(wni,j,k)

The polarity score per sentence is aggregated with other scores to generate a mean polarity score per

item (tweet or post). Per quarter, positive and negative items are counted and a normalized positivity ratio

is calculated

PN ratio =P

P +N

3.2.2 Volume-related variables

Every Facebook and Twitter page has a different level of activity. We calculated the level of activity per

quarter. For Facebook, we assume that the total number of posts, likes and shares are indicators of the

degree of interaction. The number of tweets on Twitter pages are calculated. Sometimes a message is sent

multiple times, to reach an audience across the world taking time differences in mind. We did not correct

for these kind of messages, since it is a way to increase interaction.

All the variables in our dataset are aggregated per quarter and per company, resulting in an unbalanced

panel dataset of 65 companies. There are 14 companies for which we have both Twitter and Facebook

variables. We will use the latter one for our mixed model estimation.

12

Page 19: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

3.3 The VAR model

The variables included in our unbalanced panel data frame are assumed to influence each other. Not con-

sidering them simultaneously would make our estimations inconsistent. VAR models, Vector Autoregressive

Models, are used for multivariate time series. Indeed, they are often used when there is a need to analyze

certain aspects of the relationships between the investigated variables. In each equation, the dependent

variable is a linear function of its past values and those of all the other variables. This explains the term

autoregressive (Porter, 2009).

A VAR thus consists of K endogenous variables yt = (y1t, y2t, ..., yKt for k = 1, ..,K (Pfaf, 2008). The

VAR(p)-process is defined as:

yt = A1yt−1 + ...+Apyt− p+ ut,

with ut of a K-dimensional process with E(ut) = 0 and E(utuTt ) =

∑(u), the stochastic error terms, called

impulses in the language of VAR. Using our variables, a VAR(p)-process can be written as a VAR(1)-process:

revt = α1 + φ11revt−1 + φ12postst−1 + φ13tweetst−1 + φ14twsentt−1 + φ15likest−1 + urev,t

postst = α2 + φ21revt−1 + φ22postst−1 + φ23tweetst−1 + φ24twsentt−1 + φ25likest−1 + uposts,t

tweetst = α3 + φ31revt−1 + φ32postst−1 + φ33tweetst−1 + φ34twsentt−1 + φ35likest−1 + utweets,t

twsentt = α4 + φ41revt−1 + φ42postst−1 + φ43tweetst−1 + φ44twsentt−1 + φ45likest−1 + utwsent,t

likest = α2 + φ51revt−1 + φ52postst−1 + φ53tweetst−1 + φ54twsentt−1 + φ55likest−1 + ulikes,t

Each variable is a linear function of the lag 1 values for all variables in the set. A VAR(1)-process can also

by written as:

ξt = Aξt−1 + ut,

which can be written in matrix form as:

revt

postst

tweetst

twsentt

likest

=

α1

α2

α3

α4

α5

+

φ1,1 φ1,2 φ1,3 φ1,4 φ1,5

φ2,1 φ2,2 φ2,3 φ2,4 φ2,5

φ3,1 φ3,2 φ3,3 φ3,4 φ3,5

φ4,1 φ4,2 φ4,3 φ4,4 φ4,5

φ5,1 φ5,2 φ5,3 φ5,4 φ5,5

revt−1

postst−1

tweetst−1

twsentt−1

likest−1

+

u1,t

u2,t

u3,t

u4,t

u5,t

There are many benefits of using VAR. First, the method considers all variables as endogenous, which

makes VAR a simple method since one does not have to worry about which variables are endogenous and

which ones are exogenous. Secondly, estimations are simple; the Ordinary Least Squares method can be

applied for each equation separately (Porter, 2009). However, VAR modeling has some strict assumptions.

First, in practice it is often challenging to choose the appropriate lag length. Unless you have a large sample,

13

Page 20: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

inclusion of more lags will consume a lot of degrees of freedom, with the problem of overfitting as a result (Yu,

2009). Secondly, the VAR model assumes that the variables are jointly stationary. This is important because

if a time series is non-stationary, the studied behavior is only valid for the time period under consideration.

Hence, it would not be possible to draw general conclusions for other periods. “A process is stationary if

its mean and variance are constant over time and the value of the covariance between the two time periods

depends only on the distance or gap or lag between the two time periods and not the actual time at which

the covariance is computed” (Porter, 2009). In most practical situations, weak stationarity suffices. The

properties of stochastic time series are:

Mean : E(Yt) = µ

V ariance : var(Yt) = E(Yt − µ)2 = σ2

Covariance : cov = E[(Yt − µ)(Yt+k − µ)]

To test for stationarity, we do a unit root test, known as the augmented Dickey-Fuller test. The tests

starts from the augmented unit root function

∆Yt = δYt−1 +

m∑i=1

αi∆Yt−i + µt,

where µt is a white noise term and the ∆Yt−i represents the lagged values of the dependent variable. The

ADF-tests verifies if the null hypothesis, H0 : δ = 0, with the maximal number of lags,p, determined by the

Schwarz Criterion, proposed by Schwert (Schwert, 1989):

pmax = 12 ∗ (T + 1

100)0.25

When building a VAR model, it is important to take causality into account. The idea behind causality is

that a cause can not come after an effect. The past and present may cause the future, but the future can not

cause the past. For causation to occur, variable xt needs to have some information about what value Yt+1

will take in the immediate future (Granger, 1980). If a variable xt affects an other variable yt, the former

helps improving predictions of the latter.

The Granger test for causality, tests whether Xt Granger-causes Yt or the other way around. If the

former is true, then the regression of Yt on other variables should improve if we add lagged values of Xt to

the estimation function. Consequently, xt is ‘Granger-casual‘ for yt (Granger, 1980). Granger-causality can

be tested using F-tests. Two regressions of the variable Yt are made, one with and one without lagged values

of Xt. The F-tests checks if the added x-values provide extra explanatory power to the regression:

F =(RSSR −RSSUR)/m

RSSUR/(n− k),

14

Page 21: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

with an F distribution with m and (n − k) df. m represents the number of lagged terms, n the number of

observations in the sample and k is the number of parameters estimated in the regression with lagged terms

of Xt. If the computed F-statistic exceeds the critical value, the null hypothesis of no Granger-causality

is rejected if and only if lagged values of an explanatory variable have been retained in the regression.

The Granger test is highly sensitive to the presence of unit roots and the lag length of the regression.

Tested variables have to be stationary, the optimal number of lags can be determined by the Akaike or

Schwarz Information Criteria. The choice of measure affects the results of the causality tests. Based on this

information, we decide which variables in our model are endogenous and which are exogenous. We use this

information to build the VAR model.

The optimal VAR(p) model is selected based on different criteria. As we have already discussed, this

procedure is challenging as it determines whether or not our model will be overfitted. Therefore, we compare

multiple decision criteria in order to find the optimal maximum number of lags. First, based on the Akaike

Information Criterion, Schwarz and Hannan-Quinn Information Criterion and the Final Prediction Error,

we get suggestions for an optimal VAR model. Then, to evaluate the performance of these models, we

test whether they are able to remove all residual correlation. If the latter is present in a dynamic model,

estimates are biased and generally inconsistent. Conclusions drawn from above-mentioned a model would be

incorrect (Breusch, 1978). So, to correct this, we use the Breusch-Godfrey LM-test for serial autocorrelation.

Assuming the error terms follow the autoregressive AR(p) scheme:

µt = ρ1µt−1 + ρ2µt−2 + ...+ ρpµt−p + εt,

where εt is a white noise term. The null hypothesis is that H0 : ρ1 = ... = ρp = 0 and is rejected if the

critical chi-square value is exceeded. Then, at least one ρ in the error term is statistically different from

zero.

Finally, we are interested in the response of a variable to an impulse of another variable in the system.

Impulse-Response functions, IRFs, are computed for the optimal VAR model and allow us to study the

causality between variables in the system. These functions trace out the response of the dependent variables

in the VAR system due to shocks in the error terms for several periods in the future. It is clear that impulse

response functions assume all variables to be stationary. Visual impressions are often made of the response

of one variable to a unit shock in another variable. This gives us an idea of the dynamic interrelationships

of the features included in the VAR model. A shock in one variable has no effect on the other variables

if the former does not Granger-cause the set of remaining variables. When variables have different scales,

rather than unit shocks, innovations of one standard deviation are considered. Since we have no idea of the

order of effects used in our model, we will use Generalized Impulse Response Functions. In these GIRFs,

the response of a variable is constructed as an average of what might happen, given the present and past.

Future shocks are averaged out (Gary Koop, 1996). It is viewed as the centerpiece of this analysis, since it

facilitates interpretation of the individual coefficients in the estimated model and allows comparison.

15

Page 22: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

N Overall mean Overall deviation Min. Median Max.rev 140 19517 15579.43 1227 14760 75872fbsent 140 0.8629 0.1231829 0.3333 0.9025 1.0000twsent 140 0.8445 0.1174342 0.4933 0.8578 1.0000posts 140 5.181 1.631202 0.000 5.179 7.887tweets 140 4.697 1.804298 2.197 4.069 9.780likes 140 7.406 2.452273 0.000 7.718 11.555

Table 4: Overall descriptive statistics

4 Results and Discussion

4.1 Descriptive Statistics

Considering previous literature, there is evidence of a bilateral relationship between Twitter sentiment and

revenue, through consumer evaluation and the purchase decision. The volume of WOM influences awareness,

also leading to more revenue. These interconnections were demonstrated for tweets (Rui, 2013; Oh, 2016)

and blog posts (Duan, 2008). There was empirical evidence that sales were influenced by Facebook likes and

the unique number of posts (Ding, 2016; Oh, 2016). Facebook likes are also related to the revenue (Ding,

2016). Forecasting research made use of these relationships, by predicting sales using Twitter sentiment and

volume (Asur & Huberman, 2005; Lassen 2014), Facebook likes, shares and posts (Boldt, 2016) and WOM of

blogs (Gruhl, 2005; Dhar & Chang, 2007, Onishi & Manchanda, 2012). Due to the nature of these variables,

high correlations were expected between volume of Facebook posts and shares, so the latter was left out of

our analysis. We used the natural logarithm of Facebook posts, tweets and likes. Descriptive statistics of

the variables used in the analysis are shown in tables 4, 5 and 6.

Revenue seems to correlate the highest with the volume of posts, with a Spearman correlation coefficient of

(0.42). There is also a positive relationship between revenue and the volume of tweets and likes, indicated by

Spearman correlations of (0.27) and (0.32) respectively. The sentiment variables have negative relationships

with revenue, with a correlation of (-0.19) for both the Twitter sentiment and revenue relationship, as

the Facebook sentiment and revenue. The relationship between Facebook posts and their sentiment is less

strong (-0.15) than the relationship between tweets and their sentiment (-0.17). Finally, the number of likes

is related with posts (0.36) and with tweets (0.31). An overview of the correlations is given in table 7

4.2 Facebook & Twitter model

4.2.1 Tests for stationarity

Using the augmented ADF-test, the variables were first tested for stationarity. On the 5%-significance level,

the critical value of the t-statistic is −3.00. Based on the t-statistics found in table 8, the null hypothesis

of non-stationarity can be rejected for the revenue variable. The revenue does not contain a unit root.

Facebook sentiment is non-stationary, as well as Twitter sentiment. With a t-statistic equal to (-3.966) for

posts and (-4.5759) for tweets, we can reject the null hypothesis of non-stationarity for posts and tweets.

Finally, the null hypothesis is not rejected for likes, which means this variable is non-stationary.

Since a VAR model assumes stationarity, we try to solve the non-stationarity of Facebook and Twitter

16

Page 23: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Industry N Mean Standard Dev. Min. Median Max.Alphabet Technology 5 rev 20842 1440.38 18675 21329 22451Google fbsent 0.8540 0.0370657 0.8125 0.8419 0.9000

twsent 0.8078 0.1500823 0.6000 0.8000 1.0000posts 5.506 0.8778123 4.779 5.030 6.613tweets 2.746 0.186321 2.565 2.639 2.996likes 6.556 0.733597 5.771 6.377 7.498

Apple Technology 10 rev 53712 12260.38 42123 50081 75872fbsent 0.7073 0.0300224 0.6651 0.7190 0.7486twsent 0.7357 0.04240067 0.6644 0.7362 0.8000posts 7.725 0.0883445 7.550 7.732 7.887tweets 7.590 1.966115 2.890 8.708 9.074likes 11.05 0.2463796 10.70 11.06 11.45

AT&T Telecom 10 rev 36872 4060.21 32575 36765 42119fbsent 0.7801 0.04797244 0.7087 0.7880 0.8493twsent 0.7831 0.09635818 0.5385 0.8038 0.8782posts 6.634 0.9531906 4.997 6.670 7.642tweets 5.114 1.468487 2.773 5.359 6.866likes 7.063 0.8225566 5.730 7.120 7.997

Cardinal Healthcare 11 rev 27312 3733.668 21427 27548 32039Health fbsent 0.9267 0.04155306 0.8785 0.9194 1.0000

twsent 0.8447 0.06290836 0.7308 0.8398 0.9403posts 4.689 0.6468915 2.890 4.779 5.407tweets 5.658 0.3242765 5.112 5.649 6.217likes 8.792 1.901367 3.401 9.584 10.169

Cisco Systems Technology 11 rev 12133 502.4364 11155 12137 12843fbsent 0.9564 0.01545537 0.9318 0.9529 0.9809twsent 0.9311 0.07391777 0.7778 0.9444 1.0000posts 6.260 0.1549959 5.964 6.267 6.500tweets 3.844 0.9046741 3.091 3.367 5.740likes 9.243 0.5763212 8.494 9.181 10.664

Comcast Telecom 11 rev 18424 1306.334 16791 18669 21319fbsent 0.9410 0.04123029 0.8333 0.9502 0.9786twsent 0.8393 0.08429541 0.7333 0.8571 0.9643posts 5.046 0.8424461 2.773 5.094 5.958tweets 3.902 0.5597622 2.944 4.043 4.522likes 4.675 1.256172 1.609 5.293 5.707

Disney Media 11 rev 13146 982.5374 11649 13101 15244fbsent 0.9042 0.03809248 0.8185 0.9118 0.9565twsent 0.8381 0.02702855 0.7897 0.8501 0.8738posts 5.259 0.4325906 4.522 5.268 5.861tweets 7.619 0.3594895 7.028 7.739 8.013likes 7.996 1.541024 6.500 7.393 10.997

Ford Motors Motors 11 rev 36980 1907.777 33900 37263 40251fbsent 0.8315 0.09043918 0.6667 0.8235 1.0000twsent 0.8535 0.09903704 0.6471 0.8750 1.0000posts 3.360 0.6858644 2.197 3.258 4.585tweets 2.986 0.03176739 2.890 2.996 2.996likes 5.922 1.857211 2.890 5.805 9.045

Table 5: Summary statistics (1/2)

17

Page 24: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Industry N Mean Standard Dev. Min. Median Max.JetBlue Transportation 6 rev 1525 119.8015 1349 1526 1687Airways fbsent 0.8651 0.1633456 0.6667 0.9286 1.0000

twsent 0.9006 0.03387919 0.8404 0.9150 0.9286posts 1.524 0.8915847 0.000 1.701 2.565tweets 4.688 1.412051 2.890 5.276 6.050likes 2.443 1.495048 0.000 2.593 4.248

Kroger Food 11 rev 27506 3989.615 23105 25539 34604& Drugstores fbsent 0.9437 0.02204527 0.8986 0.9516 0.9695

twsent 0.9211 0.06583025 0.8158 0.9211 1.0000posts 6.806 0.1378118 6.981 6.987 7.309tweets 3.473 0.5428782 2.197 3.714 4.007likes 8.036 0.347308 7.287 8.038 8.592

Nike Apparel 11 rev 7858 578.4626 6972 7779 9061fbsent 0.8167 0.2140497 0.3333 0.9167 1.0000twsent 0.5978 0.06491797 0.4933 0.6156 0.7297posts 3.415 1.102447 1.609 3.664 5.081tweets 5.809 1.730638 4.533 5.124 9.780likes 7.626 3.324417 1.946 8.297 11.555

Oracle Technology 11 rev 9500 963.2544 8448 9307 11320Systems fbsent 0.9053 0.05509861 0.8182 0.8958 0.9901

twsent 0.9709 0.041158 0.9000 1.0000 1.0000posts 4.904 1.209875 2.996 4.956 6.471tweets 2.953 0.03938844 2.890 2.944 2.996likes 6.315 2.280299 1.609 7.044 8.029

Salesforce Technology 10 rev 9500 267.8583 8448 9307 11320fbsent 0.9053 0.03978489 0.8182 0.8958 0.9901twsent 0.9709 0.05721545 0.9000 1.0000 1.0000posts 4.904 0.35777 2.996 4.956 6.471tweets 2.953 0.1349149 2.890 2.944 2.996likes 6.315 2.036299 1.609 7.044 8.029

Winstream Telecoms 11 rev 1425 48.33554 1345 1427 1499Holdings fbsent 0.6664 0.1022929 0.4900 0.6436 0.7988

twsent 0.8777 0.087678 0.7333 0.8966 0.9796posts 5.923 0.1863631 5.609 5.894 6.242tweets 4.843 1.075647 2.890 4.984 6.221likes 8.441 1.309018 5.768 8.656 9.697

Table 6: Summary statistics (2/2)

rev fsent twsent posts tweets likesrev 1fsent -0.19 1twsent -0.19 0.23 1posts 0.42 -0.15 0.02 1tweets 0.24 -0.26 -0.32 0.14 1likes 0.26 -0.11 -0.15 0.67 0.30 1

Table 7: Correlation Analysis

18

Page 25: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

t-statistic p-valuerevenue -3.4012 0.05fbsent -2.0077 0.573twsent -2.8145 0.2375posts -3.4664 0.0480tweets -4.4737 0.01likes -1.4889 0.6245fd.fbsent -2.8867 0.208sd.fsent -2.9372 0.1877fd.twsent -4.2727 0.01fd.likes -3.5305 0.04242

Table 8: Augmented ADF-tests

sentiment by transforming these variables. After first differencing the Twitter sentiment, it no longer contains

an unit root. Hence, we can reject the null hypothesis. Neither the first nor the second differences of Facebook

sentiment could solve the non-stationarity problem, so we omit the variable from further analysis.

4.2.2 Granger-Causality tests

We determine the causality in our model to find out if lagged variables of one variable influence forecasts

of an other variable. To do so, we perform a Granger-Causality test on our system. Outcomes of this

test depend on the number of lagged terms included in the regressions. The optimal number of lags is

determined based on multiple information criteria. Based on the AIC, we determined the number of lags

to be 5. Results of the Granger-Causality test can be found in table 9. Revenue is Granger-caused by

tweets on a 5%-significance level (p= 0.03) and by posts (p= 0.04). Since lagged tweets also Granger-cause

revenue (p= 0.03), we can say that there is a feedback system between revenue and the volume of tweets.

On the 10%-significance level, posts Granger-causes tweets (p =0.08) and lagged likes Granger-cause posts

(p= 0.05). Based on the results of the GrangerCcausality test, we will assume revenue, tweets, posts and

likes as endogenous variables. Twitter sentiment will be added as an exogenous variable.

4.2.3 Model selection

Next, we build a VAR model. As already mentioned, the biggest challenge in VAR modelling is the selection

of an optimal lag length. The decision for lag length is based on multiple selection criteria, presented in

table 10. Based on the Akaike (AIC) and Hannan-Quinn (HQ) Information Criteria and the Final Prediction

Error (FPE), the VAR(2) is preferred. The Schwarz (SC) Information Criterion however suggests a model

with only one lag. The Breusch-Godfrey LM-test for serial autocorrelation is performed for both models.

The null hypothesis that has to be tested is that there is no serial correlation of any order. If we reject this

hypothesis, it means that at least one of the lagged residuals is statistically different from zero. Thus the

model still has residual autocorrelation. The results of the Breusch-Godfrey LM-test are presented in table

10. The p-values of the VAR(1) are smaller than 0.05. It is clear that on a 5%-significance level the residuals

of the VAR(1) model still have autocorrelation. Since the p-values of the VAR(2) model are bigger than

0.05, we can not reject the null hypothesis and thus no residual autocorrelation was found.

In Figure 2, estimates, standard errors and t-statistics of the coefficients in the VAR(2) model are

19

Page 26: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

F-statistic p-valuepost- rev 2.52409597 0.04476106tweets- rev 2.71101234 0.03355857fd.twsent - rev 0.45049696 0.77184166fd.likes- rev 0.21975424 0.92694076rev- post 0.43884204 0.78030265tweets- post 0.38510417 0.81891318fd.twsent- post 0.94333677 0.44171157fd.likes- post 2.42880379 0.05180839rev- tweets 2.89561340 0.02521703post- tweets 2.12195233 0.08264406fd.twsent- tweets 0.44676642 0.77455210fd.likes- tweets 0.32806176 0.85862610rev- fd.twsent 0.13970979 0.96715188post- fd.twsent 0.05497786 0.99429516tweets- fd.twsent 1.28271511 0.28101186fd.likes- fd.twsent 0.60631800 0.65888664rev- fd.likes 0.43031762 0.78647642post- fd.likes 0.27864007 0.89126788tweets- fd.likes 0.46362470 0.76229025fd.twsent- fd.likes 0.35513970 0.83999063

Table 9: Granger causalities tests

VAR(1) VAR(2)Lags LM-Stat Prob LM-Stat Prob1 51.864 0.001247 17.151 0.87622 68.882 0.03946 59.173 0.17563 99.935 0.02878 81.692 0.27934 124.28 0.0504 103.83 0.37675 147.97 0.07871 120.34 0.6011

Table 10: LM-test for residual autocorrelation

20

Page 27: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 2: Estimation output of the VAR(2) model

21

Page 28: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 3: Impulse response functions

22

Page 29: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

presented. In a final step, generalized impulse response functions have been calculated to measure the

response of one variable as a consequence of a shock in another. This will bring more clarity on the

interpretation of these coefficients.

4.2.4 Impulse response functions

In Figure 3, graphs of the generalized impulse response functions of the VAR(2) model are presented.

Around the impulse functions, the 95%-confidence interval is drawn. When we examine the effect of one

variable on another, we first have to check if the effects are significant. If the 95%-confidence interval,

[mean− 1.96 ∗SE;mean+ 1.96 ∗SE], does not include 0, the effect is significant. In other words, if the zero

line is in between these bounds, the effect is insignificant. If we have significant results, we can quantify the

impact of a shock of one variable on an other. Tables 4 and 5 presents the values of the response functions

as a result of a shock of one standard error (SD) in an other variable. In Figure 5, we noticed there was

a significant response of tweets due to a shock in revenue in period 2. With period one representing an

immediate effect and our dataset organized as quarterly data, this effect thus will occur in the next quarter.

In table 5, we find that the response of the volume of tweets, due to the shock in revenue is equal to 0.21

percent. Based on these findings, we can not reject H8, an increase in the revenue positively affects the

volume of tweets. An increase in revenue makes more people talk about a company and its products, thus

raising more awareness in the next periods. However, the effect of this shock is not persistent and dies

out fairly quickly. Still, if a company does well, this will become clear in the volume of tweets posted by

fans. A shock of one standard error in the volume of tweets will cause an effect in the sentiment of tweets

that reaches a bottom after three quarters, causing a decrease in the positivity rate of tweets of 1.6 percent

points. Finally, a decrease in the volume of Facebook posts of one standard error causes a decrease in the

volume of likes of 0.47 percent. Fewer posts may be a sign of a decrease in interaction on a fanpage, which

may decrease brand awareness and in this way resulting in fewer likes. There is a lack of significance in

the other impulse functions, which keeps us making useful comparisons between the Twitter & Facebook

metrics. Therefore, we can not test H1 and H2. The impact of social media metrics on revenue and vice

versa might differ for every company and that is the reason why making general conclusions is impossible.

For example, one company can have a very active Facebook fanpage and little Twitter activity, while the

other has multiple Twitter fanpages and little activity on Facebook. When using both companies in our

sample, effects of their social media metrics might cancel each other out. To verify this, we will study the

impact of the Facebook and Twitter variables separately.

23

Page 30: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 4: Impulse response values (Revenue and Posts)

24

Page 31: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 5: Impulse response values (Tweets and Facebook likes)

25

Page 32: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

N Overall mean Overall deviation Minimum Median Maximumrev 451 9.06 1.22 7.021 9.221 11.787tweets 451 3.82 1.69 0.000 3.823 9.780twsent 451 0.83 0.17 0.000 0.8667 1.000

Table 11: Descriptive statistics of Twitter variables

rev tweets twsentrev 1tweets 0.14 1twsent -0.17 -0.03 1

Table 12: Correlation analysis Twitter variables

4.3 Twitter model

The large amount of literature using Twitter social media data indicates that for some industries there is

a link with real-world outcomes. We want to test if this link also exists for Fortune 500 companies. An

unbalanced panel dataset was created for 31 Fortune 500 companies. This dataset includes the 14 companies

used in the Facebook & Twitter mixed model, as well as 17 other Fortune 500 companies for which only

Twitter fanpages were found. Table 11 and 12 give a summary of the variables. We transformed the Twitter

variables by taking the natural logarithm. We will verify if the relationships between Twitter variables and

revenue, determined in previous model, also exist when we are not including Facebook data into the model.

4.3.1 Tests for stationarity

Before building the model, we check again for stationarity. The maximum number of lags was determined

using the Schwarz Criterion and thus equal to pmax = 17. Comparing the test statistics of revenue, tweets

and Twitter sentiment to a critical value of -3, we found that none of the variables contain a unit root since

t-statistics are (-4.2244), (-5.2214) and (-3.8326) respectively. We can use all the variables in our model.

4.3.2 Granger-Causality tests

The Granger-Causality tests were executed using 4 lags. Table 13 shows results of the tests. Only one of

the tests is significant. Revenue Granger-causes Twitter sentiment (p= 0.002). This means that there is

influence of lagged variables of revenue on Twitter sentiment. Therefore we decide to treat Twitter sentiment

and revenue as endogenous variables in our model. Since no significant results were found for Granger tests

involving the volume of tweets, we will include tweets as an exogenous variable in the model.

F-statistic p-valuetweets- rev 0.1629323 0.95702778twsent- rev 0.1999137 0.93834786rev- tweets 0.9589787 0.42979518twsent- tweets 1.0435881 0.38425873rev- twsent 4.2572951 0.00216686tweets- twsent 1.7340473 0.14140424

Table 13: Granger-causality tests for Twitter variables

26

Page 33: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Criterion 1 2 3 4AIC(n) -4.906291681 -4.945157843* -4.942677340 -4.940606227HQ(n) -4.877344634 -4.901737273* -4.884783247 -4.868238609SC(n) -4.832867813 -4.835022042* -4.795829606 -4.757046558FPE(n) 0.007399886 0.007117814* 0.007135523 0.007150369

Table 14: Lag selection criteria for Twitter model

VAR(2)Lags LM-stat p-value1 2.0767 0.72172 12.778 0.11973 19.013 0.088224 23.103 0.111

Table 15: LM-test for residual autocorrelation in the Twitter model

4.3.3 Model selection

Based on the Akaike and Schwarz Information Criteria, the Hannan-Quinn criterion and the Final error

prediction (table 14), the optimal VAR model is VAR(2). The LM-statistics are computed to test for serial

autocorrelation. Based on the p-values of VAR(2), p= 0.7217, we can not reject the null hypothesis. The

VAR(2) has removed all the residual autocorrelation. The coefficients of the Twitter VAR(2) model can be

found in table 6. The graphs of the generalized impulse response functions provide insight into the impact of

a variable on another. The graphs can be found in table 7, an overview of the impulse values and standard

errors are being displayed in table 8.

4.3.4 Impulse Response functions

Significant results were only found for the response of revenue on shocks in the Twitter sentiment. A shock of

one standard error causes an increase in revenue of 0.08 percent after two quarters. This confirms H3 and the

existing literature. Twitter sentiment indeed has an influence on quarterly revenue. If the Twitter sentiment

increases, more consumers are positive about a company and its products. Ultimately, this influences a

customer’s purchase decision. This effect can be strengthened, since other users will generally base their

purchase decisions on more positive tweets.

27

Page 34: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 6: Estimation output of the Twitter model

28

Page 35: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 7: Impulse response functions of Twitter the model

Figure 8: Impulse response values (Revenue and Twitter sentiment)

29

Page 36: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

N Overall mean Overall deviation Minimum Median Maximumrev 259 9.16 1.10 6.775 9.337 11.237posts 259 4.91 1.60 1.099 4.912 8.178fbsent 259 0.87 0.14 0.000 0.9167 1.0000likes 259 7.48 2.74 0.000 7.498 12.476

Table 16: Descriptive statistics of Facebook variables

rev posts fbsent likesrev 1posts 0.26 1fbsent -0.14 -0.09 1likes 0.25 0.61 -0.12 1

Table 17: Correlation analysis of Facebook variables

4.4 Facebook model

The purpose and structure of a Facebook fanpage is very different from a Twitter page. Although anyone

can access these pages, the audience of a fanpage can be very different from those of a Twitter. page as

well. We want to test whether the behavior of Facebook fans can be related to revenues as well. Therefore

we create a new panel dataset with 33 Fortune 500 companies, 14 of which are also included in the sample

used in the mixed model. For these companies, we have calculated the quarterly volume of posts and likes,

as well as the rate of positive posts and comments. To ease interpretation, we use the natural logarithm of

the variables revenue, posts and likes. An overview of the variables is provided in table 16. The correlation

diagram shows higher correlations between all variables (table 17).

4.4.1 Tests for stationarity

The stationarity of the variables included in the model are checked using the augmented ADF-test. Since

we have 259 observations, we determined the maximum number of lags to be 15. The critical value of the t-

statistic is (-3.00). For the revenue, posts, fbsent and likes variable, we find respectively following t-statistics:

table 18. We can reject the null hypothesis of non-stationarity for each of these variables. This means we

can include all variables without transformations in the model.

4.4.2 Granger-Causality tests

Since all variables are stationary, no transformations are necessary in order to build our Facebook dataset.

But, we still have to determine which variables are endogenous. Based on the AIC, we performed a Granger-

Causality test with 7 lags. Results can be found in table 19. Facebook sentiment is Granger-caused by the

volume of posts (p=0.004). Lagged values of the volume of likes also Granger-cause sentiment (p=0.0059).

t-statistic p-valuerev -3.7913 0.01992posts -3.6048 0.03316fbsent -4.313 0.01likes -3.5212 0.0412

Table 18: Augmented ADF-tests for Facebook variables

30

Page 37: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

F-statistic p.valueposts- rev 0.5328881 0.78302447fbsent- rev 0.1450795 0.98987568likes- rev 0.4924440 0.81372515rev- posts 0.8904277 0.50250861fbsent- posts 1.3332033 0.24297745likes- posts 1.6414187 0.13641281rev- fbsent 0.7353785 0.62157675posts- fbsent 2.4601645 0.02505779likes- fbsent 2.5335948 0.02136247rev- likes 0.9617110 0.45185342posts- likes 0.7590063 0.60283833fbsent- likes 1.8515196 0.08994718

Table 19: Granger-causality tests for Facebook variables

Criterion 1 2 3 4AIC(n) -3.38040425 -3.4408816* -3.43646323 -3.41986499HQ(n) -3.29561957 -3.3052261* -3.24993692 -3.18246786SC(n) -3.16971982* -3.1037865 -2.97295747 -2.82994856FPE(n) 0.03403423 0.0320385* 0.03218372 0.03272823

Table 20: Lag selection criteria for the Facebook model

Clearly, there is a relationship between the volume of posts, likes and the Facebook sentiment. In our final

VAR model, we included these variables as endogenous. Revenue is included as an exogenous variable, since

the Granger-tests showed no significant relationships between revenue and the other variables.

4.4.3 Model selection

To determine the optimal VAR, we first determine possible optimal lags based on the Akaike Information

Criterion, the Final Prediction Error, the Hannan-Quinn and the Schwarz Criterion. The latter suggests

a VAR(1) model, all the others have optimal scores for the VAR(2) model. The LM-test for residual

autocorrelation was used in order that no autocorrelation was left in these models. We rejected the hypothesis

of non-correlation in the residual terms of the VAR(1) model. The VAR(2) model on the other hand has

removed all serial correlation in the error terms. An overview of the coefficients of the estimated VAR(2)

model is given in Figure 9.

4.4.4 Impulse Response functions

Based on this VAR-model, there are little significant effects of impulses of one variable on another. The

IRF functions and values can be found in figures 10, 11 and 12. Consistent with the studied literature (Oh,

VAR(1) VAR(2)Lags LM-test p-value LM-test p-value1 29.927 0.0004514 16.155 0.063722 48.439 0.0001295 26.735 0.084093 62.308 0.0001304 36.895 0.097044 76.083 0.0001084 47.164 0.1008

Table 21: LM-test for residual autocorrelation in the Facebook model

31

Page 38: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 9: Estimation output of Facebook VAR(2) model

32

Page 39: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 10: Impulse response functions of the Facebook model

2016), we find that an increase of the volume of likes has a positive impact on the volume of posts and vice

versa. An innovation (shock) of one standard error in the volume of likes causes an immediate increase in

the volume of posts of 61 percent. This effect loses power over time and after 4 quarters its effect is only 39

percent. Vice versa, a shock in the volume of posts of one standard error causes an immediate increase in the

volume of likes of 1.34 percent. After 4 quarters, this effect has diminished to 0.72 percent. We can neither

confirm nor reject H4, H6, H8 and H10, since we could not prove revenue was linked to any of the Facebook

variables. However, both the mixed- and Facebook-model show that there is a link between the volume of

posts and likes. In prior literature, research did find a link between the volume of likes and revenue. Further

research that will include more companies or more data on a specific company could confirm this relationship

for Fortune 500 companies.

33

Page 40: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Figure 11: Impulse response values (Posts and Facebook sentiment)

Figure 12: Impulse response values (Likes)

34

Page 41: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

4.5 Conclusion

Based on the three models built in this study- a Facebook model, a Twitter model and a mixed model- we

found significant influence of the quarterly revenue on the volume of tweets. We also found a link between

the volume of tweets and Twitter sentiment, as well as a link between between the volume of posts and likes.

The significance of the latter is confirmed when only Facebook data is included into the model. When only

using the Twitter sample, an immediate significant response of Twitter sentiment on quarterly revenue is

found. Nevertheless, since this result is not significant in the mixed model, we can not test whether it is

also significant in the wider social ecology. A possible explanation for the limited significance in the mixed

model could be the insufficient amount of data available to us. Still, our results show interesting information

concerning the link between revenue and Twitter social media metrics. This could explain why foregoing

research has predominantly used Twitter as its source for social media data. However, since Facebook is to

this day the largest social media platform with the more than 70% of its users visiting the network daily,

more research into the link between Facebook and business outcomes might be needed (Molla, 2016).

Results of this study show the importance of monitoring WOM on Twitter. Obviously, when a company

does well, sentiment will be more positive. In addition, companies will try to make their followers send more

positive tweets into the world. This might be easier on company-hosted pages than on fan-hosted pages.

Companies might try to identify the most influencing users. The influence of their positive tweets will cause

others to send more positive tweets as well. This system is typical for communities. People are more inclined

to tweet positively due to self-enhancement. Especially when the community agrees with their opinion,

people will be motivated to post their similar feelings to increase sentiment (Oh et al., 2017). Increased

Twitter sentiment will result in an increased revenue in the next quarter.

35

Page 42: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

5 Limitations and future research

This master’s dissertation investigated the relationships between Facebook and Twitter social media metrics

and revenue for Fortune 500 companies. Despite the promising design of this study, making interesting

conclusions was hampered by several aspects. Firstly, the difference of the amount of data available on

Facebook and Twitter fanpages created a sample bias which influenced results of the mixed model. An

extension of the Facebook dataset by including fanpages that are organized as a Facebook Group or a

Community could affect the results. Furthermore, a larger mixed sample could make the effect of Twitter

sentiment on revenue found in the Twitter model significant in the mixed model as well. Secondly, due to

the limited availability of data, we could not control for industry in this study, nor the host of the fanpage.

Controlling for these variables might be interesting since the target audience might differ per sector or

host. For example industries will have fanpages mostly consisting of customers, while others have more

shareholders. These different audiences might require a different type of content, which can influence the

social media metrics. The effect of social media fanpages can also influence other business outcomes such as

stock price movements. Future research might expose different dynamics of these fanpages.

36

Page 43: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

References

Albert, N., Merunka, D., and Valette-Florence, P. (2008). When consumers love their brands: Exploring the

concept and its dimensions. Journal of Business Research, 61(10):1062–1075.

Archak, N., Ghose, A., and Ipeirotis, P. G. (2011). Deriving the Pricing Power of Product Features by

Mining Consumer Reviews. Management Science, 57(8):1485–1509.

Asur, S. and Huberman, B. A. (2013). Predicting the Future with Social Media. Applied Energy, 112:1536–

1543. arXiv: 1003.5699.

Bagozzi, R. P. and Dholakia, U. M. (2006). Antecedents and purchase consequences of customer participation

in small group brand communities. International Journal of Research in Marketing, 23(1):45–61.

Barnes, N. G. (2014). Social commerce emerges as big brands position themselves to turn “follows”, “likes”

and “pins” into sales. Marketing Management Association Annual Spring Conference Proceedings, Chicago,

IL, pages 8–13.

Boldt, L., Vinayagamoorthy, V., Winder, F., Melanie, S., Ekram, M., Mukkamala, R., Buus Lassen, N.,

Flesch, B., Hussain, A., and Vatrapu, R. (2016). Forecasting Nike’s Sales using Facebook Data, pages

2447–2456. IEEE, United States.

Borah, A. and Tellis, G. J. (2015). Halo (Spillover) Effects in Social Media: Do Product Recalls of One

Brand Hurt or Help Rival Brands? Journal of Marketing Research, 53(2):143–160.

Breusch, T. S. (1978). Testing for Autocorrelation in Dynamic Linear Models*. Australian Economic Papers,

17(31):334–355.

Chen, K., Xie, D., and Wang, H. (2015). A novel sales prediction approach: Competing vs supporting

networks. Australian Journal of Business and Management Research, 4(10):8–n/a. Copyright - Copyright

Australian Journal of Business and Management Research (AJBMR) Jan 2015; Document feature - ; Last

updated - 2015-06-22.

Chern, C.-C., Wei, C.-P., Shen, F.-Y., and Fan, Y.-N. (2015). A Sales Forecasting Model for Consumer

Products Based on the Influence of Online Word-of-mouth. Inf. Syst. E-bus. Manag., 13(3):445–473.

Chevalier, J. A. and Mayzlin, D. (2006). The Effect of Word of Mouth on Sales: Online Book Reviews.

Journal of Marketing Research, 43(3):345–354.

Cova, B. and Pace, S. (2006). Brand community of convenience products: new forms of customer empower-

ment – the case “my nutella the community”. European Journal of Marketing, 40(9/10):1087–1105.

de Valck, K., van Bruggen, G. H., and Wierenga, B. (2009). Virtual communities: A marketing perspective.

Decision Support Systems, 47(3):185–203.

DeMarco, T. (2013). “buzz” vs. sentiment: Word of mouth’s most important distinction.

37

Page 44: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Dhar, V. and Chang, E. A. (2009). Does Chatter Matter? The Impact of User-Generated Content on Music

Sales. Journal of Interactive Marketing, 23(4):300–307.

Ding, C., Cheng, H. K., Duan, Y., and Jin, Y. (2017). The power of the “like” button: The impact of social

media on box office. Decision Support Systems, 94:77–84.

Duan, W., Gu, B., and Whinston, A. B. (2008). The dynamics of online word-of-mouth and product

sales—An empirical investigation of the movie industry. Journal of Retailing, 84(2):233–242.

Edosomwan, S., Prakasan, S. K., Kouame, D., Watson, J., and Seymour, T. (2011). The History of Social

Media and its Impact on Business. Journal of Applied Management and Entrepreneurship; Sheffield,

16(3):79–91.

Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis (2013). Understanding the pre-

dictive power of social media. Internet Research, 23(5):544–559.

Facebook. Facebook Pages Terms.

Fortune500 (2016). Fortune500.

Gary Koop, M. Hashem Pesaran, S. M. P. (1996). Impulse response analysis in nonlinear multivariate models.

Journal of Econometrics, 74:119–147.

Granger, C. (1980). Testing for causality: A personal viewpoint. ournal of Economic Dynamics and Control,

2:329–352.

Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A. (2005). The Predictive Power of Online

Chatter. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery

in Data Mining, KDD ’05, pages 78–87, New York, NY, USA. ACM.

Gwinner, K. P., Gremler, D. D., and Bitner, M. J. (1998). Relational Benefits in Services Industries: The

Customer’s Perspective. Journal of the Academy of Marketing Science, 26(2):101–114.

Hanna, R., Rohm, A., and Crittenden, V. L. (2011). We’re all connected: The power of the social media

ecosystem. Business Horizons, 54(3):265–273.

Hollis, N. (2011). The value of a social media fan.

Jang, H., Olfman, L., Ko, I., Koh, J., and Kim, K. (2008). The Influence of On-Line Brand Community

Characteristics on Community Commitment and Brand Loyalty. International Journal of Electronic

Commerce, 12(3):57–80.

Jansen, B. J. (2009). The commercial impact of social mediating technologies: Micro-blogging as online

word-of-mouth branding. Extended Abstracts on Human Factors in Computing Systems, pages 3859–3864.

Java, A., Song, X., Finin, T., and Tseng, B. (2007). Why We Twitter: Understanding Microblogging Usage

and Communities. page 56. Springer.

38

Page 45: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Johanna Gummerus, Veronica Liljander, Emil Weman, and Minna Pihlstrom (2012). Customer engagement

in a Facebook brand community. Management Research Review, 35(9):857–877.

Joseph P. Cothrel (2000). Measuring the success of an online community. Strategy & Leadership, 28(2):17–21.

Kaplan, A. M. and Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of

Social Media. Business Horizons, 53(1):59–68.

Kim, E., Sung, Y., and Kang, H. (2014). Brand followers’ retweeting behavior on Twitter: How brand

relationships influence brand electronic word-of-mouth. Computers in Human Behavior, 37:18–25.

Kozinets, R. (2008). Netnography 2.0. The Handbook of Qualitative Research

Methods in Marketing. Edited by Russell W. Belk. Edward Elgar Publishing,

http://www.academia.edu/1433499/Netnography2.0.TheHandbookofQualitativeResearchMethodsinMarketing.EditedbyRussellW .Belk.

Kwon, E. S., Kim, E., Sung, Y., and Yoo, C. Y. (2014). Brand followers. International Journal of Advertising,

33(4):657–680.

Lassen, N. B., Madsen, R., and Vatrapu, R. (2014). Predicting iPhone Sales from iPhone Tweets. In 2014

IEEE 18th International Enterprise Distributed Object Computing Conference, pages 81–90.

Latan, B. (1981). The psychology of social impact. American Psychologist, 36:343–356.

Liang, T.-P., Ho, Y.-T., Li, Y.-W., and Turban, E. (2011). What drives social commerce: The role of social

support and relationship quality. International Journal of Electronic Commerce, 16(2):69–90.

McAlexander, J. H., Schouten, J. W., and Koenig, H. F. (2002). Building Brand Community. Journal of

Marketing, 66(1):38–54.

Mishne, G. and Glance, N. (2006). Predicting Movie Sales from Blogger Sentiment. Microsoft Research.

Molla, R. (2016). Social studies: Facebook vs. twitter.

Muniz, Albert M., J. and O’Guinn, T. (2001). Brand Community. Journal of Consumer Research, 27(4):412–

432.

Oh, C., Roumani, Y., Nwankpa, J. K., and Hu, H.-F. (2017). Beyond likes and tweets: Consumer engagement

behavior and movie box office in social media. Information & Management, 54(1):25–37.

Onishi, H. and Manchanda”, P. Marketing Activity, Blogging, and Sales - MSI Web Site .

Pfaf, B. (2008). Var, svar and svec models: Implementation within r package vars. Journal of Statistical

Software.

Ping, J. W., Goh, K. Y., Lin, Z., and Goh, A. C. Q. (2012). DOES SOCIAL MEDIA BRAND COMMU-

NITY MEMBERSHIP TRANSLATE TO REAL SALES? A CRITICAL EVALUATION OF PURCHASE

BEHAVIOR BY FANS AND NON-FANS OF A FACEBOOK FAN PAGE. ECIS 2012 Proceedings.

Porter, D. G. D. C. (2009). Basic Econometrics. Mcgraw-Hill Education - Europe, fifth edition.

39

Page 46: PREDICTING REVENUE WITH SOCIAL MEDIA DATA · PREDICTING REVENUE WITH SOCIAL MEDIA DATA: “A COMPARISON BETWEEN TWITTER AND FACEBOOK FANPAGES” Word count: 8913 Eline Debakker Student

Rinker, T. (2017). Why sentimentr.

Rui, H., Liu, Y., and Whinston, A. (2013). Whose and what chatter matters? The effect of tweets on movie

sales. Decision Support Systems, 55(4):863–870.

Saboo, A. R., Kumar, V., and Ramani, G. (2016). Evaluating the impact of social media activities on human

brand sales. International Journal of Research in Marketing, 33(3):524–541.

Sashi, C. (2012). Customer engagement, buyer-seller relationships, and social media. Management Decision,

50(2):253–272.

Schwert, G. (1989). Tests for unit roots: A monte carlo investigation. Journal of Business & Economic

Statistics, 7:147–159.

Statista (2017). Most popular social networks used by fortune 500 companies in 2016.

Tiryakian, E. A., Gruzd, A., Wellman, B., and Takhteyev, Y. (2011). Imagining Twitter as an Imagined

Community. American Behavioral Scientist, 55(10):1294–1318.

Tufekci, Z. (2014). Big Questions for Social Media Big Data: Representativeness, Validity and Other Method-

ological Pitfalls. arXiv:1403.7400 [physics]. arXiv: 1403.7400.

Twitter. Parody, commentary, and fan account policy.

Walsh, M. (2006). Social networking sites fuel e-commerce traffic.

Wu, J.-J., Chen, Y.-H., and Chung, Y.-S. (2010). Trust factors influencing virtual community members: A

study of transaction communities. Journal of Business Research, 63(9–10):1025–1032.

Yu, C. H. (2009). Illustrating degrees of freedom in terms of sample size and dimensionality.

Zaglia, M. E. (2013). Brand communities embedded in social networks. Journal of Business Research,

66(2):216–223.

Zhi-Ping Fan, Yu-Jie Che, Z.-Y. C. (2017). Product sales forecasting using online reviews and historical sales

data: A method combining the bass model and sentiment analysis. Journal of Business Research, 74:90–100.

Zhu, F. and Zhang, X. M. (2010). Impact of Online Consumer Reviews on Sales: The Moderating Role of

Product and Consumer Characteristics. Journal of Marketing, 74(2):133–148.

40


Recommended