Knowledge discovery in social media mining for market analysis

transcript

Knowledge Discovery in Social Media Mining for

Market Analysis

By: Senuri Wijenayake

Introduction

Problem Addressed: Three research areas in Social Media Mining Predictive Power Community Detection Influence Propagation

Focus: Analyzed the existing literature and find applications in Social Media for Knowledge Discovery for Market Analysis

Background

Fact 1: Facebook has over 1.55 billion active users by November 2015(extracted from Statistics Portal – November 2015)

Fact 2: All adults spend at least 2 hours a day on some form of social media network

Focus of Research

A rich source of data with

human sentiment

and behavior

Developed online

relationships and groups

Online interactions

where people voice their

Understand customer

satisfaction and changing

customer requirements

Focused marketing campaigns for better

results

Influencing consumer behavior

effectively via influential

Using Social Media to make Predictions

Progress So Far: Human Intuition – Can’t be duplicated Data Based Models – Inadequate data to

represent human cognitive process

SOLUTION: Use data available on social media for predictive analysis.

Using Social Media to make Predictions

Progress So Far: Yahoo Finance Message Board – Stock market

variability (Antweiler & Frank 2004)

Google Search Queries – Track disease outbreaks (Ginsberg et al. 2009)

Amazon Reviews – Predicting product sales (Ghose & Ipeirotis 2011)

General Framework for SMM for Predictions

Stage 1: Preprocessing Social Media data

are unstructured Convert them into

high quality structured data, suitable for data mining

Quality: Strong et al. (1997) Objectivity Completeness Sufficiency

Stage 2: Predictive Analysis Develop a model to

make accurate predictions on a new set of data (Harold 2013)

Methodologies: Market Models Survey Models Statistical

Models

Data Preprocessing

Problem SolutionData Cleaning Missing values

NoiseOutliers

SubstitutionRegression

Data Integration

Entity IdentificationRedundancy

Schema based Entity IdentificationDuplicate Detection

Data Transformatio

Data can’t be used straight away for mining

GeneralizeAttribute Construction

Data Reduction

Large amounts of data requires a significant processing power

Data Cube AggregationAttribute Selection

Application of Predictions in Market AnalysisObjective: How the knowledge available could be used to make predictions with regard to market analysis and how successful is it ? Microblogging (Twitter) is most popular

Focus: Twitter data for predicting box office

performance of movies

Application of Predictions in Market AnalysisLiterature: Asur & Huberman (2010) used correlation and

regression based models on Twitter data

Leskovec (2011) rectified imperfections which could rise due to incomplete data

Vasu Jain (2013) used sentiment analysis for predictions

Gaikar & Marakarkandy (2015) introduced a framework for using Twitter data for sentiment analysis and making predictions

Application of Predictions in Market Analysis

Gaikar & Marakarkandy (2015)

Predict box office performance of a Bollywood movie as a hit, flop or

an average

Predict the opening weekend revenue collection

Twitter for Predictions: Methodology

Module 1: Data Extraction The most trending hashtag on Twitter

and related hashtags are extracted (HashTags.org)

Twitter4j API used to connect and extract tweets from Twitter servers

Stored in mySQL database Movie star ratings taken from Timex

CelebexA complete set of most relevant data

has been extracted

Module 2: Sentiment Analysis

Module 3: Predictive Analysis Predicting movie performance

Input: Sentiment score + Movie Star Rating

Process: Fuzzy Inference based model is created

Output: Box office movie performance as Hit, Flop or Average

Module 3: Predictive Analysis Predicting weekend collection

Input: Hype factor, Shows per day on all screens, average full house collection

Process: Output: Estimated opening weekend

collection

Twitter for Predictions: Findings & Evaluation 10269 tweets for 14 movies released in

a period of six months (relevant, complete, sufficient) was considered

Actor ratings in the month of release was considered

Predictions compared against the real ratings extracted from IMDB (near perfect predictions)

Mean Square Error used to evaluate the effectiveness of the predictive model (<7% error rate)

Twitter for Predictions: Findings & Evaluation

Twitter for Predictions: Applications

If the predicted revenue < budgeted revenue, increase marketing and publicity efforts

Can determine the maximum allowable promotional budget

Limitations:Only two predictor variables used to

predict box office performance (sentiment score + actor rating)

Use more variables

Using Social Media for Community Detection

60% of American population chose social media as their first choice for information seeking (Scot et al. 2014)

Social relationships transferred to the internet

Online communities based on similar interests and opinions have been created

Opinion based community detection can be used to identify such online communities

Literature: Park & Cho (2012) identified online communities

as an information source for apparel shopping Dev (2014) proposed an algorithm for

community detection in social media based on different interaction methods (no opinion mining)

Kavoura (2014) identified the impact of online communities for communication

Dinsoreanu & Potolea introduced a framework for opinion based community detection in social media

Using Social Media for Community Detection

Data Preparation: Extracted user comments from blog posts and

forums A classification model for opinion mining created a

set of labelled documents and 5 grammar rules introduced by Turney 2002.

Extracted tokens (after filtering) are classified into positive and negative opinions using SVM and NB. A sentiment score assigned to each token.

Tokens stored in a structure format (includes the id, holder, opinion keyword, polarity score etc.)

Community Detection: Methodology

Opinion based Community Detection:

Identifying communities based on similar interests in multiple targets

Aggregate functions to represent the similarity of opinions in multiple targets

Similarity graphs based on Euclidean distance were drawn

Opinion based Community Detection: Similarity Functions:

1000 labelled documents used as the training set for NB and SVM

Near perfect classification of opinions can be obtained

A user generated data set was used to apply community detection algorithms

Findings: Linear functions perform poorly when

number of targets increase Exponential functions with cutoff perform

best with increasing opinions

Community Detection: Findings & Evaluation

A practice application of community detection was not conducted

Suggestion: The proposed framework can be applied in the pharmaceutical industry for online community detection

Background Literature: “CyberRx” by Radar & Subhan (2013)

Community Detection: Limitations

Community Detection: Potential Application

CyberRx New ApproachData

CollectionForums and Blogs using Google Alerts

Additional sources such as bulletin boards

Keywords Used

Formal names and language

More popular brand names and consumer driven language

Opinion Mining

Manual Automated (SVM classifier)

Community Detection

Manual Aggregated functions and Similarity graphs

Findings Two main communities,- Side effects,

medications- Changing medication

More specific communities can be identified

Community Detection: Potential Application

Knowledge such as,Most prevalent diseases classified based

on geography and demography

Most popularly used brands of drugs

Competing alternatives for a given drug

Information of specifications, variations, duration ad personal experience of side effects (both normal and abnormal)

Using Social Media for Influence Propagation

People influence each other via online interactions and communications

Purchase decisions are heavily influenced by eWoM in social media networks

34% of Twitter users post product related opinions at least once a week (ROI Research Institute)

Objective: Target most influential user on social media to activate a chain of influence driven by eWoM

Literature: Khobzi (2014) conducted a basic content based

analysis on Facebook posts, to identify the connection between the sentiment and the popularity of the post

Kaiser et al. (2012) analyzed opinion formation and influential users based on data collected on iPhone reviews

Okazaki et al. (2014) explored the different types of customer engagement in social media networks and their impact on influence propagation

Using Social Media for Influence Propagation

Influence Propagation: Methodology

Focus group: IKEA customers Training set included 300 preprocessed Tweets Classified manually based on customer

emotional status and content Emotional Status: Satisfied, Dissatisfied, Neutral Content: Information, Sharing, Opinion, Question,

Trained NB, KNN, SVM classfiers NB performed best

Influence Propagation: Application

New data set: 4000 tweets Users were seen as nodes and tweets as their

relationships Google’s PageRank algorithm to determine the

relative importance of each user

Findings: One satisfied user sharing information (positive

eWoM) Three dissatisfied users spreading negative

opinions

Influence Propagation: Suggestions

Conclusion: Influential Users can be identified Different customer satisfaction levels are crucial

Suggestions: Using celebrities and converting their followers into

influence makers. Additional incentives could be provided to encourage

engagement in discussions Closely monitor for dissatisfied customers online and

occasionally mediate in retweets suggesting feasible solutions and demonstrate their commitment

Knowledge Discovery in SMM: Conclusions

Consolidates the potential knowledge areas that could be exploited for market analysis via community detection in, predictive power of and influence propagation in social media.

Properly preprocessed social media data, with acceptable quality when applied to robust statistical models could predict future market trends with considerable accuracy.

Social media taken social relationships to the digital platform and have created opinion based communities online. These can be used to identify genuine consumer requirements.

Knowledge Discovery in SMM: Conclusions

People express their genuine consumer experiences on social media networks which clearly influence purchasing decisions of other potential consumers.

An efficient framework can identify influential users online and trigger a chain of positive eWoM promoting viral marketing.

Questions & Answers

Knowledge discovery in social media mining for market analysis

Technology