+ All Categories
Home > Documents > Progress Report Social Media...

Progress Report Social Media...

Date post: 27-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
31
COMP60990 Research Methods and Professional Skills Progress Report Social Media Analytics 8 th May 2015 MARCUS KIN ING LEUK M.Sc. Advanced Computer Science and IT Management Supervised by: Dr. Ilias Petrounias
Transcript
Page 1: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

COMP60990 Research Methods and Professional Skills

Progress Report

Social Media Analytics

8th May 2015

MARCUS KIN ING LEUK

M.Sc. Advanced Computer Science and IT Management

Supervised by: Dr. Ilias Petrounias

Page 2: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Abstract

Online social media is an emerging form of communication and interaction between people.

It allows people to publish brief messages and share information with others using various

platforms such as Facebook and Twitter. This also leads to the increase in importance of online

social influence. Social influence is the ability to affect one’s thoughts, emotions, or behaviours.

Individuals in the online community with this ability are called social media influencers and they

are often used to spread opinions and market products.

The main objective of the project is to identify topic-specific influential members in the

Twitter context. Our study reports important findings, discusses various approaches and addresses

the problems faced by other researchers while identifying influencers. We also proposed a

framework that consists of a list of criteria to identify influencers. This set of criteria was defined by

looking at influencers in the traditional and online social network context.

Our work starts by first identifying a topic on Twitter and collecting a dataset of Twitter user

accounts related to that topic. After that, we will select personal accounts only for further analysis.

Next, we will apply our framework to rank the users in terms of their influence on a specific topic.

Finally, we will evaluate our framework by applying different approaches proposed by other

researchers to the same data we collected and compare the resulting influencers.

!1

Page 3: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Table of contents

Abstract 1

Table of contents 2

1. Introduction 3

1.1 Aims and objectives 3

1.2 Report layout 3

2. Background 4

2.1 Measuring influence on Twitter 4

2.2 Motivation 5

3. Literature Review 6

3.1 Criteria of social media influencers 6

3.2 Approaches for identifying influencers 10

3.3 Outcome 13

3.4 Concluding remarks 14

3.5 Importance of our work 15

4. Project Progress 17

4.1 Data collection 17

4.2 Classification of Twitter user accounts 18

4.3 Proposed framework 19

4.4 Evaluation 23

5. Project Plan 25

6. Conclusion 27

References 28

!2

Page 4: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

1. Introduction

The creation of Web 2.0, which emphasise on user-generated contents, usability and

interoperability have turned former online information readers into information producers. The

main feature of Web 2.0 is the ability to allow users to interact with one another in the social media

network (o'Reilly 2009). This led to the creation of many social networking sites such as Facebook,

Twitter, Google+ and LinkedIn. Other than providing the opportunity for millions of people to

communicate, these sites led to the creation of online communities, which encourages people to

share information and exchange opinions on common topics. Nowadays, it is common practice for

people to share their thoughts and also read the opinions of others on a wide range of topics, which

includes everything from reviewing products, discussing about politics, to expressing personal

emotions.

1.1 Aims and objectives

The main purpose of the project is to develop a suitable approach to identify topic-specific

influential users on the Twitter network. In order to do this, a set of criteria of an influencer has to

be identified, which will be used to develop a theoretical framework. Then, a manageable dataset of

user accounts that are related to a specific topic on Twitter will be collected and use to evaluate the

proposed framework. Final results should identify users that are influential on a specific topic on

Twitter.

1.2 Report layout

The remaining part of this report is structured as follows: Chapter 2 covers essential

background knowledge specific to the project and the motivation for undertaking the project.

Chapter 3 reviews multiple approaches for identifying social media influencers covered in the

!3

Page 5: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

literature and also describes the importance of our work. Chapter 4 describes the whole process of

the project, including the initial proposed framework and the evaluation plan. Whereas, Chapter 5

presents the future plans to complete the project. Finally, Chapter 6 concludes the report.

2. Background

This chapter covers briefly on the background knowledge specific to the project, including the

reason for choosing Twitter as a research platform and also the motivation for undertaking the

project. Details about related work done in the field will be discussed in Chapter 3.

2.1 Measuring influence on Twitter

As of March 2015, 70% of Internet users have active social media accounts. Studies showed

that Facebook and Twitter have a monthly estimate of 1415 million and 288 million active user

accounts respectively, which make them two of the most popular social media platforms (Kemp

2015). Twitter is a micro-blogging service that allows users to publish brief messages(tweets) and

also include links to other websites in the message. Its most attractive feature is the 140 character

limit for each tweet, which encourages users to post tweets that are not too lengthy by capturing the

important bits of a topic. Unlike other social networking sites like Facebook, where both parties

have to agree in order to become friends, Twitter allows the concept of following where users are

allowed to follow anyone they want without needing the other party to follow them back(Weng,

Lim et al. 2010). These features make Twitter an excellent marketing platform for marketers to

launch effective marketing campaigns, by targeting influencers. Hence, Twitter was chosen as the

ideal research platform for the project.

!4

Page 6: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

2.2 Motivation

Word-of-Mouth(WOM) is an informal communication behaviour, where consumers

exchange experiences about specific products and services among each other (Westbrook 1987). In

the online community, WOM is also known as viral marketing. Since social media are used

extensively nowadays, and WOM has such a great impact in the purchasing decision-making

context, marketers could use viral marketing methods to spread knowledge about their brand,

products or services across customers with a low marketing cost. However, customers can be

overwhelmed by the massive amount of reviews and opinions generated by the online communities.

It is their choice to choose which opinions that they think are trustworthy and so, it is important for

marketers to accurately identify suitable opinion leaders/influencers for the company.

The Oxford dictionary defines influence as “the capacity to have an effect on the character,

development, or the behaviour of someone or something.” Feng et al. (2011) described social

influence as the power possess by a person to have an effect on the thoughts or actions of others.

According to the research done by Wu et al. (2010), more than half the contents on Twitter were

contributed by only 0.05% of the Twitter population. This group of people are online influencers,

where most of them are celebrities, politicians and the news media. It was found that most

information was produced by the news media, whereas celebrities and politicians had the most

followers. On the other hand, most Twitter users have less influence as they often only read and

share information created by the influencers.

Influencers are very important to businesses because they are able to use WOM to cause a

chain-reaction, in which information can be spread quickly across a wide audience. Like viruses,

this marketing strategy takes advantage of rapid multiplication, to spread the marketing message to

!5

Page 7: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

millions of people (Wilson 2000). By identifying and persuading these influencers, companies are

able to market their products to more people in a shorter time, with minimal marketing cost. As

mentioned by Domingos (2005), in traditional marketing, customers will receive offers only when

the expected profit is more than the cost of the offer. Whereas in viral marketing, offering products

to influencers for free could benefit many times in sales to other customers. Therefore, it is

important for companies to realise how online influencers can be essential assets to their business.

3. Literature Review

This chapter describes the related work done by others, which includes the comparison and

discussion of existing techniques for identifying influencers. We identified areas that were

overlooked by others and also described the importance of our work. The details about our approach

will be discussed further in Chapter 4.

3.1 Criteria of social media influencers

To search for influencers, it is important to start by identifying a set of criteria that makes a

person an influencer. Keller and Berry (2003), Akritidis et al. (2011) and Agarwal et al. (2008) had

similar views on some of the characteristics that influencers should have, which were summarised

below:

• Activity generation - An influencer’s post should be able to generate activities by initiating

discussions among others. This can be measured by looking at the number of comments that the

post receives. If a post received a large amount of comments, it means that many people spend

time reading and exchanging thoughts about that post, which indicates that it may have

significant influence to the people (Agarwal, Liu et al. 2008).

!6

Page 8: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

• Recognition - An influencer’s post should be recognised by many. This can be measured

indirectly by looking at the number of inlinks(post referenced in other posts). A high number of

inlinks suggests that the post is highly recognisable by most people. If the referring posts are

highly influential, the referred post will become even more influential (Agarwal et al., 2008).

• Eloquence - Influencers are usually expressive and persuasive. They are the people others look

up to for advice. They believe that WOM is more important than traditional media. Besides, they

are not afraid of sharing their opinions on what they like or dislike (Berry 2003). Normally, these

traits can be seen from the quality of the user’s post. The quality of a post can be measured in

various ways, such as vocabulary usage, fluency, and content analysis. However, these properties

are difficult to analyse due to the informal nature of most social networks. Instead, Akritidis et al.

(2011) and Agarwal et al. (2008) had used the length of post to determine whether a post is

influential or not. They believed that users have no reason to write lengthy posts to bore the

readers. Thus, a lengthy post often indicates some necessity to do so.

• Novelty - As suggested by Keller and Berry (2003), innovative and unique posts will exert more

influence. Influencers should always be optimistic and have a high tendency of accepting new

things. Besides, influencers are often trendsetters because they normally acquire information

before others and they like to share the information with their followers. To determine whether a

post is novel, we can look at the number of outlinks(the other posts that it is referring to). Fewer

number of outlinks indicates that the post refers to none or very few other posts or articles, which

means that it is more likely to be novel. The number of comments is also correlated with the

number of outlinks, where more novel posts will attract the attention of more people (Agarwal,

Liu et al. 2008).

The four criteria stated above are basic properties of an influencer’s post. Zhou et al. (2009)

suggested some other criteria, which include the activeness of the user and the date they joined

!7

Page 9: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

social media. It was believed that users that are very active and have been on social media for a long

time are more likely to be considered as influencers.

On the contrary, findings by Agarwal et al. (2008) conflicted with those of Zhou et al.

(2009), who divided social media users into four groups: influential and active, influential but

inactive, not influential but active, and not influential and inactive. Based on their findings, active

social media users are not necessarily influential and influential users may be inactive (Agarwal,

Liu et al. 2008). However, despite user activeness and the duration they have joined social media,

Akritidis et al. (2011) claimed that any user could raise to become an influencer if they recently had

several influential posts, which have had an impact to the online community.

The latter claim was also backed up by Agarwal et al. (2008), in which they categorised

influencers into four groups based on their different temporal patterns:

• Long-term influencers — users that are able to maintain their influential status for a very long

time.

• Average-term influencers — users that are influential for 4-5 months.

• Transient influencers — users that can maintain their influential status for only 1-2 months.

• Burgeoning influencers — users who have recently emerged as an influencer, which might

become any of the three types of influencers stated above in the future.

In fact, the duration of influence indirectly indicates the user’s reputation and knowledge. For

instance, long-term influencers are often users that will gain the highest level of trust from others

and they may also be viewed as experts in that particular field. Therefore, long-term influencers are

considered the most influential among others.

!8

Page 10: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Alternatively, users can be influencers if they are well-connected with other users. This can

be measured by the degree of centrality. Users who possess a high outdegree of centrality (the

number of direct connections the user has to others) is expected to be influencers. This is because

they have many connections, which makes them stand out from others. It is beneficial for users to

have many connections because they are able to access more resources and find alternative ways to

satisfy their needs, which also makes them less dependent on others (Hanneman and Riddle 2005).

Ya-ting and Jing-min (2011) divided centrality into three categories: degree centrality, betweenness

centrality and closeness centrality. Degree centrality measures all the direct connections of the user,

which indicates the user’s ability to interact with others. It is important to note that the user’s direct

connections alone will not determine the level of influence. Betweenness centrality is the measure

of how strategic the user’s position is in the network, i.e., the user is in a good position if he is on

the shortest path between others. This indicates how well the user is able to control resources.

Lastly, closeness centrality indicates how independent a user is to others. If a user is less dependent

on others, he has a higher degree of centrality because other users often depend on him.

In addition, empirical evidence gathered by researchers has suggested that most influencers

exhibit specific behaviours and that their influential status was not gained accidentally. Quercia et

al. (2011) mentioned that a user’s level of influence will be determined by the level of user

involvement and audience engagement. Cha et al. (2010) also added that influence can be gained

through concerted effort, whereas personal involvement is crucial for maintaining influence.

Besides, they concluded that anyone can gain influence by focusing on a single topic, as well as

publishing creative, unique and insightful posts.

In social media, the use of language in one’s post is closely linked to social influence.

Influencers must have good communication skills, in which their choice of words should always be

!9

Page 11: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

persuasive, in order to convince their followers. Quercia et al. (2011) found that influencers often

structure their tweets in a similar linguistic manner. Most of them will also include negative

sentiments as part of their posts. Furthermore, linguistic qualities can also reflect the user’s

emotions and personality. Thus, it is important to look into linguistic features while identifying

influencers.

3.2 Approaches for identifying influencers

Over the past few years, various tools and techniques have been created for detecting

influencers. Some examples of widely used tools are Simply Measured, Twtrland, Followerwonk

and Klout. What makes these tools popular are their stunning graphical illustration of the data and

their user friendly interface. Besides, most of them are free and readily available. Despite having a

few benefits, the results they produced are still not convincing enough. This is because their

analysis methods are often based on quantitative measures alone, such as the number of tweets

posted, the number of followers and the number of retweets. Features such as quality of posts and

linguistic structures are not taken into consideration.

To prove that the number of retweets were not sufficient in determining influence, Hubspot

conducted research on more than 2.7 million tweets that contained a link to another website. Results

showed that most users retweet without even clicking on the link to look at the content they were

retweeting, which means that information are often being passed on blindly (Bennett 2012). The

frequency of posts was also not reliable in determining influence because nowadays, there are

numerous tools that allow tweets to be posted to Twitter automatically at a preset frequency.

!10

Page 12: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Besides, the French Huffington Post had conducted an experiment to prove that

organisations and individuals may also buy fake followers to boost their popularity and gain trust

from the customers. In the experiment, they have bought more than 50,000 followers with the

budget of only 33 euros (Provost 2012). This proves how simple and cheap it is to buy followers.

Furthermore, it is common to assume that the level of impact of a user is determined by the number

of followers he has. The assumption is true only if all tweets published are read by all the followers.

Researches have also proved that there is a weak correlation between popularity and influence. In

order for information to spread across a network, it is important for an individual to have followers

who are active in forwarding information to others, rather than passive readers who does nothing

(Romero, Galuba et al. 2011). As a result, many marketing professionals will still favour the

“manual” approach for identifying influencers instead of relying on tools, even though it may be

time consuming.

To accurately identify influencers without relying on the tools described above, various

techniques have been carried out on raw data collected using the Twitter Application Program

Interface(API). Cha et al. (2010) made an in-depth comparison between the three activities that

represent the different types of influence of a person:

• Indegree influence — measured by the number of followers of a user. The amount of followers

determines the size of the audience for that user.

• Retweet influence — determined by collecting the number of retweets that contained one’s

name. The amount of retweets indicates the ability for the user to create quality posts that others

think are worth sharing.

• Mention influence — measured by looking at the number of mentions that contained one’s

name. This indicates how well can the user engage with others in a conversation.

!11

Page 13: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Many other different approaches have been used by researchers to identify influencers.

Kwak et al. (2010) carried out similar experiments where they ranked influencers based on

PageRank, the number of followers(indegree) and the number of retweets. The PageRank algorithm

was first used by Google to rank their web pages in their search results (Page, Brin et al. 1999).

PageRank was used to measure influence because the influence of a user is similar to the concept of

“authority” of a web page: “a Twitterer has high influence if the sum of influence of his followers is

high; at the same time, his influence on each follower is determined by the relative amount of

content the follower received from him”(Weng, Lim et al. 2010).

Weng et al. (2010) criticised that PageRank ignores the topical interest of Twitter users,

which would affect how they influence others. They believed that most users may not read tweets

with topics that do not interest them and so, influence will vary according to different topics.

Instead, they have proposed an approach that takes into account topical similarity and link structure

among Twitter users, which they called TwitterRank. The TwitterRank algorithm starts by

generating a directed graph, which shows the following relationship of the users and use a random

surfer to visit each connected user based on a transition probability. The transition probability is

calculated by looking at the common topical interest between two users. The more topical interest

shared between the two users, the higher the transition probability. By repeating the process, a

topic-specific relationship between the users can be constructed.

Furthermore, the report filed by Alex Leavitt et al. (2009) suggested that using the number

of followers alone to determine influence is acceptable only if Twitter is a normal broadcast

medium and we ignore the fact that Twitter users are able to interact with the content on the

platform. Therefore, they proposed a way that better explains how influence occurs in the Twitter

network, which is by calculating the ratio of followers to followees.

!12

Page 14: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

3.3 Outcome

Research conducted by Weng et al. (2010) showed that 72.4% of Twitter users follow more

than 80% of their followers, and 80.5% of Twitter users have 80% of their friends follow them

back. Based on the results, it is clear that reciprocity exists in the Twitter context and they suggested

two reasons to explain such reciprocity. Firstly, it might be casual following because it is so easy to

follow someone on Twitter and some users follow others simply for etiquette. On the other hand, it

might be homophily, where Twitter users follow each other because they have similar topical

interests. McPherson et al. (2001) explained this phenomenon as a principle, in which the influence

and connection between similar people (i.e., similar culture, background, characteristics or interests)

occurs at a higher rate than among dissimilar people, and it is present in many social networks. If

most Twitter users are following others based on the first reason, then it is not reliable to use the

number of followers to determine influence. However, their research has shown that homophily

does exist in the Twitter context, which means that some users choose who to follow seriously and

they only follow people with similar topical interest.

Besides, Cha et al. (2010) found that most followed users were politicians, celebrities and

the news sources. Therefore, they suggested that these were the users to look for if a lot of attention

is needed from a wide audience. On the other hand, retweets represent influence beyond the one-to-

one interaction between users. They found that most retweeted users were businessmen, the news

sources and the content aggregation services. They also suggested that retweeting is a powerful

method to emphasise a message. The probability for people to accept new ideas will be higher as

the number of people who repeats the same message increases (Watts and Dodds 2007). Lastly,

users that were mentioned the most were often celebrities. This is because celebrities have many

!13

Page 15: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

fans and celebrity gossip is always a popular topic on Twitter, therefore, they are considered the

centre of public attention(Cha, Haddadi et al. 2010).

Cha et al. (2010) also found that there was a strong correlation between retweet influence

and mention influence. This means that, users who get retweeted frequently are also often

mentioned, or vice versa. On the other hand, indegree was not related to the other measures, which

explains why users with a lot of followers do not make them influencers that are good at engaging

in conversations and spreading information.

Besides, most influencers are influential over a variety of topics, which meant that they are

popular opinion leaders that can be relied on to spread information, even if they are not experts in

certain areas. It is also more effective to target top influencers in social media to start a viral

marketing campaign, instead of employing a large number of non-popular users (Cha, Haddadi et

al. 2010).

Kwak et al. (2010) reported that PageRank and indegree had ranked the influencers in a

similar manner, whereas influencers ranked by the number of retweets were different. As mentioned

in Chapter 3.2, PageRank was developed with ranking web pages as its main purpose. Furthermore,

experimental results proved that TwitterRank outperformed PageRank in identifying influencers

(Weng, Lim et al. 2010).

3.4 Concluding remarks

Relying on the number of followers to identify influencers is a common misconception by

many marketers. In fact, this approach reveals only the popularity of a person. Besides, research

!14

Page 16: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

found that followers can be easily bought with low cost. Therefore, the number of followers should

only be taken as a contributing factor to a person’s degree of influence.

Many features of influencers have been identified, which boils down to four main criteria:

activity generation, recognition, eloquence and novelty. Other criteria such as the activeness and the

duration the user joined social media were suggested. As a matter of fact, it is not the case that

influential users are always active users, and vice versa. Besides, the time span for successfully

maintaining an influential status helps others determine the user’s trustworthiness. Furthermore,

influencers should have a reasonable amount of user involvement, frequent audience engagement,

and they should be well-connected with others. Linguistic features should also be taken into

consideration as it reveals the user’s emotions and personality.

In short, there are certainly other potential criteria of an influencer, besides the ones

described above. It is clear that each criterion on its own is insufficient to identify influencers

accurately. Therefore, they will be used jointly for the project.

3.5 Importance of our work

Most approaches developed by others focused on finding influencers based on obvious data

that they were able to collect from Twitter, such as the number of tweets, retweets, indegree and

mentions. For example, Cha et al. (2010) used only the number of retweets and mentions to

determine influencers, whereas others used algorithms such as PageRank and TwitterRank. These

algorithms were complex and were developed with ranking web pages in mind. We think that the

characteristics of influencers are far more complicated, in which most algorithms do not take these

characteristics into account. A common mistake made by most researchers was to identify

!15

Page 17: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

influencers based on separate criteria. In addition, while identifying the list of criteria, most

researchers limit their thoughts to what data can Twitter provide, hence, overlooking some

important aspects.

Instead, we looked at influence in a traditional context, assuming that the Internet does not

exist and focused on what makes an individual influential. It was found that people are easily

influenced by family members compared to strangers, and the main reason is always about trust.

For example, most children will go to their parents for advice and do what is asked by their parents

without hesitation. This is because they know that their parents will not harm them. This

phenomenon also applies to adults, even though it is not obvious. A recent example is the huge

fashion influence by Kate Middleton, i.e., when she wears a dress, consumers will go to stores to

buy the same dress. It is suggested that influencers are trusted by others because of their experience

and knowledge on a specific matter. In traditional social network, Tedeschi et al. (1972) suggested

that an influencer should possess the characteristics of a leader, which includes self-confidence,

self-control and the need for achievement. The need for power was also found to be related to

influence (Mowday 1978).

By using this approach, we came out with a list of criteria for influencers in the traditional

social network context. We also looked at influencers in online social media. It was found that there

are some differences and potential features that could be added to the list of criteria. For example,

user activeness is important because on Twitter, there are millions of followers and followees that

do not physically meet and talk to each other. So, influencers need to be active in sharing their

opinions to others to gain recognition for themselves.

!16

Page 18: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

In short, other related work looked at influencers from the online social media context only.

Our work is different because we combined the insights from both the traditional and online social

networks, which led to an expanded set of criteria to identify influencers. These criteria will form

the basis of our work, which is shown in our proposed framework in Chapter 4.3.

4. Project Progress

This chapter describes the progress of the project so far and also the overall process of identifying

influencers. First, a dataset of user accounts related to a specific topic have to be collected from

Twitter. Then, the accounts have to be filtered and classified. After that, users will be ranked

according to their degree of influence, based on the techniques proposed in our framework. Finally,

the result will be evaluated to ensure that the influencers are identified correctly.

4.1 Data collection

Twitter has a vast 302 million monthly active user accounts and an average of 500 million

Tweets are posted per day. In order to identify influencers for a specific topic, we need to first

determine a fairly recent topic on Twitter and retrieve a manageable dataset of users that actively

discuss about that topic. These data can be collected using the REST API provided by Twitter. The

API allows developers to read and write Twitter data, which includes reading user profiles and

follower data.

By searching the keywords of a specific topic, will enable us to retrieve a collection of

relevant tweets that is related to that topic. Besides, other information about the users will be

retrieved together with the tweets. Table 1 shows a sample of the data collected using the API, with

some brief explanation of a few important fields.

!17

Page 19: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Table 1: Sample data collected from Twitter with description of specific fields

4.2 Classification of Twitter user accounts

After collecting data from Twitter, the user accounts have to be filtered so that the amount of

data for further analysis is manageable and this could also improve the accuracy of the result. There

is a diverse range of Twitter accounts because different users create accounts for different purposes.

Generally, Twitter accounts can be classified into three categories: personal accounts, supervised

non-personal accounts and bot accounts. Personal accounts belong to individual users, where they

have the ability to post anything without being filtered or controlled by anyone else. These accounts

often exhibit a wide range of different behaviours and they are the only type of account that would

Sample Field Description

"created_at":"Mon Jan 26 22:17:07 +0000 2015" The time when the tweet was created.

id:559837463074447360 The unique identifier for this tweet.

text:"Extend your battery life and protect your iPhone 6 while saving 64% [Deals] http:\/\/t.co\/CsHXisph54 #iphone"

The content of this tweet.

in_reply_to_status_id:559806614761664512 This indicates that the tweet is a reply to another tweet. The ID represents the original tweet’s ID. This field can be null if the tweet is not a reply.

name:"KARA JAV Cams" The user’s name defined by the user. This is not necessarily the user’s actual name.

location:"Chester The location of the account defined by the user. This field can be null.

followers_count:1947 The number of followers that the user currently has.

friends_count:1081 The number of people that the user is following.

listed_count:11 The number of public lists that the user is in.

favourites_count:3293 The number of tweets posted by the user that was marked as favourite by others.

statuses_count:64613 The number of tweets posted by the user.

lang:"en" The user interface language set by the user. It does not necessarily represent the language in which the tweet was posted.

!18

Page 20: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

be useful for our project. On the other hand, non-personal accounts belong to organisations or a

group of people with common interests. The activities of these accounts are often heavily controlled

and their tweets often express the opinion of the group as a whole, instead of personal views.

Therefore, they should be excluded for further analysis. Finally, bot accounts are computer

programs that are programmed to automatically generate tweets. These accounts could be spam or

fake, which would affect the accuracy of our result and so, they should also be eliminated.

4.3 Proposed framework

After collecting and filtering the data, a list of criteria has to be determined to help identify

influencers. The criteria described below represent some of the properties that a user should possess

in order to become an influencer, together with the methods for measuring the criteria. It is also

important to note that not all criteria can be implemented as automatic procedures, since analysis

tools does not exist for all of them.

• Activeness — Influencers are expected to actively express opinions and their interests with

others. They must also have a strong motivation to share information with their followers. This

criterion is compulsory for influencers because without user activity, it is difficult to pass on

information to others, which is the first step of influencing other people. The activeness of a user

can be measured by looking at the number of tweets and retweets published by the user.

• Experience — The experience of a user is closely related to the user’s trustworthiness. People

normally think that the more experienced an individual is, the more they trust them. For example,

children often trust the advice given by their parents in certain situations because they think that

their parents might have gone through similar situations in the past given their older age. In order

to measure experience, the date that a topic first emerge in the public is compared with the date

that the user first posted about that topic. The longer the duration, the more experienced the user.

!19

Page 21: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

• Knowledge & Reputation — It is justifiable to say that influencers are often experts in the field

they discussed the most. Besides, the amount of knowledge possesses by the user is correlated

with the online reputation of the user. For example, a user with superior knowledge on a specific

brand and its products will gain a higher trust from others, who are seeking for advice for a

product before purchasing it. According to Agarwal et al. (2008), the temporal pattern of the user

will determine the level of knowledge and reputation of that user. They claimed that users that

are able to maintain a long-term influence may gain higher trust from others. Besides, these users

are not just experts on a particular topic, but rather on topics of a similar theme. For example,

users that know a lot about the iPhones 6 are often experts on other Apple products as well.

• Activity generation — Influencers should have the ability to initiate discussion among others.

This can be measured by calculating the average number of tweets posted by the user per day and

also the average number of comments that the user gets for each tweet. The average number of

tweets posted per day indicates the user’s activity, which can be calculated as follows:

Average activity rate = Number of tweets posted

Age of account

Furthermore, if a tweet received a large number of comments, it means that many people spend

time reading and exchanging thoughts about it, which indicates the level of influence that the

tweet holds (Agarwal, Liu et al. 2008).

• Activity rate + account age — According to Zhou et al. (2009), users that are very active and

have been on social media for a long time are more likely to be considered as influencers.

Therefore, the combined measure of the average activity rate (1) and the account age can help us

determine influencers.

Activity rate and account age = 0.5 * Normalised average activity rate + 0.5 * Normalised account age

!20

(1)

(2)

Page 22: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

• Recognition — The number of followers gives an estimate of the size of the audience. If the user

has very few followers, his posts would have less impact on others. Besides, the average

frequency of a user’s tweet being retweeted indicates the rate at which the message is spread

across Twitter. For example, if a three day old tweet and a thirty day old tweet both have the

same number of retweets, the former tweet is said to gain more recognition.

Average retweet frequency = Average number of retweets

Average age of the tweet

Recognition can also be measured indirectly by looking at the number of inlinks(tweets

referenced in other tweets). A high number of inlinks suggests that the tweet is highly

recognisable by most people (Agarwal, Liu et al. 2008). So, by combining the number of

followers, the average retweet frequency (3) and the number of inlinks, allow us to measure the

level of recognition of the user.

• Twitter Follower-Followee(TFF) Ratio — This calculates the ratio of a user’s followers to

their followees (the people who the user follows), which categorise the users into different types.

If the ratio is close to 1 (nearly the same amount of followers and followees), it means that the

user might be a listener or simply seeking for knowledge. However, if the ratio approaches

infinity, the user is probably very motivated to share information with others and is very

confident in what he posted. Finally, if the ratio is close to 0 (low number of follower, but high

number of followees), that user can be treated as a spammer (Alex Leavitt 2009).

TFF Ratio = Number of followers

Number of followees

• Quality of the tweets posted — Users who frequently post nonsense on Twitter will not be

considered an influencer. This criterion has to be measured in combination of two other criteria,

which are the average number of retweets and the average favourite count. Retweeting is an

!21

(3)

(4)

Page 23: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

action of broadcasting tweets from others to your followers, which can also be seen as

reinforcing the message. So, if someone retweets, it means that they find the tweet worth sharing

with others. Therefore, higher number of retweets indicates higher quality of the tweet. Next, the

average favourite count measures how many times a user’s tweet is marked as favourite by

others. This method is often used by Twitter users to express their acknowledgment and

acceptance of the message, which could also indicate the quality of the tweet.

• Novelty — Influencers must be innovative whereby they have a high tendency to accept new

things and also share new ideas with others. This can be measured by comparing the date that a

topic first rise in popularity and the date that the user first discussed about the topic. In order to

do this, a fairly current topic has to be identified from news sources and the date it first appeared

have to be recorded. If the date of the user’s first post about the topic is very near to the date it

appeared in the news, then it indicates that the user might be one of the first users to tweet about

that topic. To further increase the accuracy of determining the novelty of a post, it is important to

check whether the user’s tweets consist of references (e.g., retweets) to other tweets or links to

other websites.

• Centrality — The degree of centrality indicates how well the user is connected with others.

Centrality analysis can be used to find important members in the Twitter network. The degree of

centrality tells whether an individual plays a central role in the network. Users who possess a

high outdegree of centrality (the number of direct connections the user has to others) is expected

to be influencers, because outdegree represents the ability of the user to socialise with others.

Hanneman et al. (2005) suggested that these users are able to acquire resources easily and

therefore, they are less dependent on others. In addition, the centrality of a user can be divided

into three categories:

a) Degree centrality — measures all the direct connections of the user, which indicates

the user’s ability to interact with others.

!22

Page 24: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

b) Betweenness centrality — measures how strategic the user’s position is in the

network, i.e., the user is in a good position if he is on the shortest path between

others. This indicates how well the user is able to control resources.

c) Closeness centrality — indicates how independent a user is to others. If a user is

less dependent on others, he has a higher degree of centrality because other users

often depend on him.

All criteria described above will be used jointly to identify topic-specific influencers.

However, in reality, not all criteria will be treated as equally important. In fact, the weight of each

criterion will depend on the application (the reason for identifying influencers). If the weights

changes frequently for different applications, it is not up to us to decide on the weights. Therefore,

our project focuses on a general approach, in which we do not take into consideration different

weights of criteria. We leave the assignment of weights to those who use our framework to identify

influencers for various applications.

4.4 Evaluation

In order to determine the reliability of the proposed framework, it will be evaluated against a

dataset of Twitter user accounts. The proposed framework will be applied to the data to identify a

set of relevant topic-specific influencers. In order to compare the performance of our framework,

various approaches proposed by other researchers will be applied to the same data to determine

their outcome. The influencers determined by each approach will be compared with our result.

Since most approaches were based on quantitative data and algorithms, we assume that the

final results will be different. This is because our approach takes into account qualitative measures

!23

Page 25: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

such as analysing the context of each tweet and combining the criteria from both the traditional and

online social network context. For instance, when Apple introduced the new Apple Watch, many

Twitter users shared their thoughts about what they like or dislike about the product by mentioning

Apple’s Twitter account in their tweets. They mentioned Apple’s account instead of just ‘Apple’

because they wish that Apple would notice their tweets and respond to them. As a result, the

PageRank algorithm would consider the Apple account as an influencer, which should not be the

case since Apple did not do anything. Therefore, it is necessary to look at more than one criterion to

identify influencers and it is also important to look at the content of the tweets.

!24

Page 26: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

5. Project Plan

!

Keys: Important dates Exam period

Table 2: Project plan Gantt chart

The literature review will take up most of the project time because various research papers

have to be read in order to know what have been done by other researchers in the field and also

enable us to generate ideas for designing our framework. The framework design phase was also

given a longer period because the proposed framework is still in the early stages, in which it will

continue to evolve as more insights are gained from brainstorming and other research papers.

Next, we have to evaluate our proposed framework by identifying a current topic on Twitter

and a manageable dataset of Twitter accounts. The progress for these phases were delayed and the

expected finish time was shifted. This is because problems were encountered during data collection

using the API, due to the recent changes in the authentication method made by Twitter. The grey

!25

Page 27: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

area in table 2 represents the examination period, where more attention has to be put into revision

for other modules.

Once the exam period is over, we will start gathering a dataset of user accounts for the

chosen Twitter topic. At the same time, we have to monitor the life-cycle of the chosen topic and

use this data together with the user accounts to gain insights on influential users. Both of these

phases were expected to be completed in at least seven weeks. These phases were given a long

period because extra work have to be done to filter the data, such as differentiating personal

accounts from non-personal accounts. Besides, it is important to eliminate irrelevant data carefully

so that we will have a manageable dataset for the evaluation of the framework.

The evaluation stage will involve testing of the data collected against the list of criteria

described in the proposed framework to identify a set of influencers. We will also compare our

framework to others by applying other frameworks to the same dataset to observe the outcome. A

four week period was also designated for a contingency plan just in case more time may be required

for data collection and evaluation. Finally, sufficient time was given to write and produce a high

quality dissertation.

!26

Page 28: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

6. Conclusion

The aim of the project is to determine key influencers in online social media. The motivation

for undertaking this project is due to the rise in popularity of viral marketing, as a marketing

strategy for marketers to promote their brand and products. In order to do this, they have to

accurately identify online influencers. Twitter was chosen as the research platform because Twitter

is more of a broadcast medium compared to other social media platforms, which makes it the

perfect platform for marketers to launch marketing campaigns.

Related work suggested that activity generation, eloquence, novelty, and recognition forms

the basic criteria for influencers. Over the past few years, a few other criteria have also been added

to the list. Furthermore, researchers have always been using the number of followers, retweets and

mentions to measure influence. But, influencers are found to be far more complicated and could not

be accurately identified using quantitative measures alone. Therefore, we proposed a framework

which looked at influencers in both the traditional and online social network context. Furthermore,

qualitative measures such as the quality of contents posted by the user were taken into account. It is

important to note that all the criteria should be used jointly to accurately measure influence.

Overall, the project is progressing as planned despite some minor problems faced during

data collection. In addition, sufficient time was assigned for each project phase to ensure that the

project could be completed on time.

!27

Page 29: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

References

Agarwal, N., et al. (2008). Identifying the influential bloggers in a community. Proceedings of the 2008 international conference on web search and data mining, ACM.

Akritidis, L., et al. (2009). Identifying influential bloggers: Time does matter. Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on, IET.

Alex Leavitt, E. B., David Fisher, Sam Gilbert (2009). The Influentials: New Approaches for Analyzing Influence on Twitter. Web Ecology Project.

Bennett, S. (2012) Twitter Users Often Retweet Without Reading Or Clicking Links, Study Reveals.

Berry, E. K. a. J. (2003). "The Influentials." Concentrated Knowledge for the Busy Executives 25(5).

Cha, M., et al. (2010). "Measuring User Influence in Twitter: The Million Follower Fallacy." ICWSM 10(10-17): 30.

Domingos, P. (2005). "Mining social networks for viral marketing." IEEE Intelligent Systems 20(1): 80-82.

Feng, P. E. B. J. (2011). "Measuring user influence on twitter using modified k-shell decomposition."

Hanneman, R. A. and M. Riddle (2005). Introduction to social network methods, University of California Riverside.

Kemp, S. (2015) Digital, Social & Mobile Worldwide in 2015.

!28

Page 30: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Kwak, H., et al. (2010). What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, ACM.

McPherson, M., et al. (2001). "Birds of a feather: Homophily in social networks." Annual review of sociology: 415-444.

Mowday, R. T. (1978). "The exercise of upward influence in organizations." Administrative Science Quarterly: 137-156.

o'Reilly, T. (2009). What is web 2.0, " O'Reilly Media, Inc.".

Page, L., et al. (1999). "The PageRank citation ranking: Bringing order to the web."

Provost, L. (2012) Achat de followers sur Twitter: nous avons fait le test et acheté 50.000 abonnés.

Quercia, D., et al. (2011). In the mood for being influential on twitter. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, IEEE.

Romero, D. M., et al. (2011). Influence and passivity in social media. Machine learning and knowledge discovery in databases, Springer: 18-33.

Tedeschi, J. T., et al. (1972). "The exercise of power and influence: The source of influence." The social influence processes: 287-345.

Watts, D. J. and P. S. Dodds (2007). "Influentials, networks, and public opinion formation." Journal of consumer research 34(4): 441-458.

Weng, J., et al. (2010). Twitterrank: finding topic-sensitive influential twitterers. Proceedings of the third ACM international conference on Web search and data mining, ACM.

!29

Page 31: Progress Report Social Media Analyticsstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · main feature of Web 2.0 is the ability to allow users to interact with

Westbrook, R. A. (1987). "Product/consumption-based affective responses and postpurchase processes." Journal of marketing research: 258-270.

Wilson, R. F. (2000). "The six simple principles of viral marketing." Web Marketing Today 70(1): 232.

Wu, S., et al. (2011). Who says what to whom on twitter. Proceedings of the 20th international conference on World wide web, ACM.

Ya-ting, L. and C. Jing-min (2011). The social network analysis of political blogs in people: Based on centrality. Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on, IEEE.

Zhou, H., et al. (2009). Finding leaders from opinion networks. Intelligence and Security Informatics, 2009. ISI'09. IEEE International Conference on, IEEE.

!30


Recommended