Date post: | 18-Feb-2017 |
Category: |
Technology |
Upload: | senuri-wijenayake |
View: | 90 times |
Download: | 0 times |
Knowledge Discovery in Social Media Mining for
Market Analysis
By: Senuri Wijenayake
Introduction
Problem Addressed: Three research areas in Social Media Mining Predictive Power Community Detection Influence Propagation
Focus: Analyzed the existing literature and find applications in Social Media for Knowledge Discovery for Market Analysis
Background
Fact 1: Facebook has over 1.55 billion active users by November 2015(extracted from Statistics Portal – November 2015)
Fact 2: All adults spend at least 2 hours a day on some form of social media network
Focus of Research
A rich source of data with
human sentiment
and behavior
Developed online
relationships and groups
Online interactions
where people voice their
ideas
Understand customer
satisfaction and changing
customer requirements
Focused marketing campaigns for better
results
Influencing consumer behavior
effectively via influential
users
Using Social Media to make Predictions
Progress So Far: Human Intuition – Can’t be duplicated Data Based Models – Inadequate data to
represent human cognitive process
SOLUTION: Use data available on social media for predictive analysis.
Using Social Media to make Predictions
Progress So Far: Yahoo Finance Message Board – Stock market
variability (Antweiler & Frank 2004)
Google Search Queries – Track disease outbreaks (Ginsberg et al. 2009)
Amazon Reviews – Predicting product sales (Ghose & Ipeirotis 2011)
General Framework for SMM for Predictions
Stage 1: Preprocessing Social Media data
are unstructured Convert them into
high quality structured data, suitable for data mining
Quality: Strong et al. (1997) Objectivity Completeness Sufficiency
Stage 2: Predictive Analysis Develop a model to
make accurate predictions on a new set of data (Harold 2013)
Methodologies: Market Models Survey Models Statistical
Models
Data Preprocessing
Problem SolutionData Cleaning Missing values
NoiseOutliers
SubstitutionRegression
Data Integration
Entity IdentificationRedundancy
Schema based Entity IdentificationDuplicate Detection
Data Transformatio
n
Data can’t be used straight away for mining
GeneralizeAttribute Construction
Data Reduction
Large amounts of data requires a significant processing power
Data Cube AggregationAttribute Selection
Application of Predictions in Market AnalysisObjective: How the knowledge available could be used to make predictions with regard to market analysis and how successful is it ? Microblogging (Twitter) is most popular
Focus: Twitter data for predicting box office
performance of movies
Application of Predictions in Market AnalysisLiterature: Asur & Huberman (2010) used correlation and
regression based models on Twitter data
Leskovec (2011) rectified imperfections which could rise due to incomplete data
Vasu Jain (2013) used sentiment analysis for predictions
Gaikar & Marakarkandy (2015) introduced a framework for using Twitter data for sentiment analysis and making predictions
Application of Predictions in Market Analysis
Gaikar & Marakarkandy (2015)
Predict box office performance of a Bollywood movie as a hit, flop or
an average
Predict the opening weekend revenue collection
Twitter for Predictions: Methodology
Module 1: Data Extraction The most trending hashtag on Twitter
and related hashtags are extracted (HashTags.org)
Twitter4j API used to connect and extract tweets from Twitter servers
Stored in mySQL database Movie star ratings taken from Timex
CelebexA complete set of most relevant data
has been extracted
Twitter for Predictions: Methodology
Module 2: Sentiment Analysis
Twitter for Predictions: Methodology
Module 3: Predictive Analysis Predicting movie performance
Input: Sentiment score + Movie Star Rating
Process: Fuzzy Inference based model is created
Output: Box office movie performance as Hit, Flop or Average
Twitter for Predictions: Methodology
Module 3: Predictive Analysis Predicting weekend collection
Input: Hype factor, Shows per day on all screens, average full house collection
Process: Output: Estimated opening weekend
collection
Twitter for Predictions: Findings & Evaluation 10269 tweets for 14 movies released in
a period of six months (relevant, complete, sufficient) was considered
Actor ratings in the month of release was considered
Predictions compared against the real ratings extracted from IMDB (near perfect predictions)
Mean Square Error used to evaluate the effectiveness of the predictive model (<7% error rate)
Twitter for Predictions: Findings & Evaluation
Twitter for Predictions: Applications
If the predicted revenue < budgeted revenue, increase marketing and publicity efforts
Can determine the maximum allowable promotional budget
Limitations:Only two predictor variables used to
predict box office performance (sentiment score + actor rating)
Use more variables
Using Social Media for Community Detection
60% of American population chose social media as their first choice for information seeking (Scot et al. 2014)
Social relationships transferred to the internet
Online communities based on similar interests and opinions have been created
Opinion based community detection can be used to identify such online communities
Literature: Park & Cho (2012) identified online communities
as an information source for apparel shopping Dev (2014) proposed an algorithm for
community detection in social media based on different interaction methods (no opinion mining)
Kavoura (2014) identified the impact of online communities for communication
Dinsoreanu & Potolea introduced a framework for opinion based community detection in social media
Using Social Media for Community Detection
Data Preparation: Extracted user comments from blog posts and
forums A classification model for opinion mining created a
set of labelled documents and 5 grammar rules introduced by Turney 2002.
Extracted tokens (after filtering) are classified into positive and negative opinions using SVM and NB. A sentiment score assigned to each token.
Tokens stored in a structure format (includes the id, holder, opinion keyword, polarity score etc.)
Community Detection: Methodology
Opinion based Community Detection:
Identifying communities based on similar interests in multiple targets
Aggregate functions to represent the similarity of opinions in multiple targets
Similarity graphs based on Euclidean distance were drawn
Community Detection: Methodology
Opinion based Community Detection: Similarity Functions:
Community Detection: Methodology
1000 labelled documents used as the training set for NB and SVM
Near perfect classification of opinions can be obtained
A user generated data set was used to apply community detection algorithms
Findings: Linear functions perform poorly when
number of targets increase Exponential functions with cutoff perform
best with increasing opinions
Community Detection: Findings & Evaluation
A practice application of community detection was not conducted
Suggestion: The proposed framework can be applied in the pharmaceutical industry for online community detection
Background Literature: “CyberRx” by Radar & Subhan (2013)
Community Detection: Limitations
Community Detection: Potential Application
CyberRx New ApproachData
CollectionForums and Blogs using Google Alerts
Additional sources such as bulletin boards
Keywords Used
Formal names and language
More popular brand names and consumer driven language
Opinion Mining
Manual Automated (SVM classifier)
Community Detection
Manual Aggregated functions and Similarity graphs
Findings Two main communities,- Side effects,
medications- Changing medication
More specific communities can be identified
Community Detection: Potential Application
Knowledge such as,Most prevalent diseases classified based
on geography and demography
Most popularly used brands of drugs
Competing alternatives for a given drug
Information of specifications, variations, duration ad personal experience of side effects (both normal and abnormal)
Using Social Media for Influence Propagation
People influence each other via online interactions and communications
Purchase decisions are heavily influenced by eWoM in social media networks
34% of Twitter users post product related opinions at least once a week (ROI Research Institute)
Objective: Target most influential user on social media to activate a chain of influence driven by eWoM
Literature: Khobzi (2014) conducted a basic content based
analysis on Facebook posts, to identify the connection between the sentiment and the popularity of the post
Kaiser et al. (2012) analyzed opinion formation and influential users based on data collected on iPhone reviews
Okazaki et al. (2014) explored the different types of customer engagement in social media networks and their impact on influence propagation
Using Social Media for Influence Propagation
Influence Propagation: Methodology
Focus group: IKEA customers Training set included 300 preprocessed Tweets Classified manually based on customer
emotional status and content Emotional Status: Satisfied, Dissatisfied, Neutral Content: Information, Sharing, Opinion, Question,
Reply
Trained NB, KNN, SVM classfiers NB performed best
Influence Propagation: Application
New data set: 4000 tweets Users were seen as nodes and tweets as their
relationships Google’s PageRank algorithm to determine the
relative importance of each user
Findings: One satisfied user sharing information (positive
eWoM) Three dissatisfied users spreading negative
opinions
Influence Propagation: Suggestions
Conclusion: Influential Users can be identified Different customer satisfaction levels are crucial
Suggestions: Using celebrities and converting their followers into
influence makers. Additional incentives could be provided to encourage
engagement in discussions Closely monitor for dissatisfied customers online and
occasionally mediate in retweets suggesting feasible solutions and demonstrate their commitment
Knowledge Discovery in SMM: Conclusions
Consolidates the potential knowledge areas that could be exploited for market analysis via community detection in, predictive power of and influence propagation in social media.
Properly preprocessed social media data, with acceptable quality when applied to robust statistical models could predict future market trends with considerable accuracy.
Social media taken social relationships to the digital platform and have created opinion based communities online. These can be used to identify genuine consumer requirements.
Knowledge Discovery in SMM: Conclusions
People express their genuine consumer experiences on social media networks which clearly influence purchasing decisions of other potential consumers.
An efficient framework can identify influential users online and trigger a chain of positive eWoM promoting viral marketing.
Questions & Answers