+ All Categories
Home > Documents > Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter...

Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter...

Date post: 23-Aug-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
34
Forensic Investigation of Smartphones Using Lexicon- Based Mood Analysis and Text-Mining Methods Panagiotis Andriotis, Atsuhiro Takasu, Theo Tryfonas
Transcript
Page 1: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Forensic Investigation of Smartphones Using Lexicon-

Based Mood Analysis and Text-Mining Methods Panagiotis Andriotis, Atsuhiro Takasu, Theo Tryfonas

Page 2: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Overview

Introduction

Micro-blog Sentiment Analysis

Twitter feeds vs. Short Message Service (SMS)

Mood Score Calculation (Lexicon-based Approach)

Mood Score Evaluation and Optimization for SMS

Sentiment Timeline View (developing a forensic tool)

Conclusions and Future Work

Page 3: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Introduction and Problem Specification

Page 4: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Affiliations

Page 5: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Is Mood Important? Seems it is. if we inspect apps and web sites it is clear that we are sharing our emotions and feelings regularly.

Page 6: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Twitter Sentiment Analysis in the Internet (1)

Page 7: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Twitter Sentiment Analysis in the Internet (2)

Page 8: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Twitter Sentiment Analysis in the Internet (2)

Page 9: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Twitter Sentiment Analysis in the Internet (2)

Page 10: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Twitter Sentiment Analysis in the Internet (2)

Page 11: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Questions to be answered

Can we apply N.L.P. methods to perform mood analysis on SMS?

Is a Twitter Feed (tweet) similar to a SMS?

Is stemming important?

Can we optimize the algorithm (focus on SMS and their characteristics)?

Can we depict the extracted result during a forensic analysis?

Page 12: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Data Collection and Experimental Setup

Page 13: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Defining Emotions Positive vs. Negative

Page 14: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Emotion Classification

Positive Joy

Happiness

Intimacy

Familiarity

Friendship

Love

Negative Anger

Malevolence

Enmity

Fear

Disgust

Sadness

Page 15: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

The data we used We collected 6566 tweets from the (public) accounts of famous people and celebrities (TWT). We also used already classified tweets (SENT140) and a SMS dataset. We utilized 3 different lexicons (AFINN, WordNet, NRC).

Page 16: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Lexicons and Algorithm

Lexicons Characteristics AFINN (contains positive and

negative words with their valence)

WordNet-Affect (WRDNT: contains synsets)

NRC word-emotion lexicon (contains numerous hashtags, words, valence)

A Bag-of-words Approach Let Lp = {lpi}, be the set of our positive

textual markers and Ln = {lnj}, be the set of our negative textual markers. C is the total of single tweets or SMS, C = {tk}.

If a positive marker lpi, appears in a tweet or SMS (tk) in the corpus, we set lpi(tk) = 1. Else, lpi(tk) = 0. We also perform the same calculations for negative markers lnj.

The tweet sentiment score s(tk) is equal to the total of positive markers found in a tweet minus the total of negative markers found in the tweet.

s(tk) = Σilpi(tk) – Σjlnjt(k)

Page 17: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Datasets

TWT: twitter feeds

PoSENT140 from SENT140

NegSENT140 from SENT140

SMS dataset (sanitized)

Positive SMS (manually classified)

Negative SMS (manually classified)

Page 18: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Experiments and Results

Page 19: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Is Stemming Useful? (1)

Using the AFINN lexicon:

We calculated the mood scores of each tweet in the TWT corpus and the SENT140 corpus (no stemming).

We performed the same experiments on the same datasets using stemming (Porter’s stem algorithm).

Finally, the same tests were done on the SMS dataset (using stemming).

Page 20: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Is Stemming Useful? (2)

The following table suggests that our results were better when we used stemming (tweets dataset).

Distribution of textual markers within the datasets without using stemming and with stemming.

Page 21: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Is Stemming Useful? (3)

The following table suggests that our results were better when we used stemming (SMS dataset).

Distribution of textual markers within the SMS datasets without using stemming and with stemming.

Page 22: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Visualizing the results

Distribution of lexicon words found in: Tweets without stemming (left) and (right) in Stemmed tweets (blue) and SMS (red).

Blue: TWT, Red:NegSENT140, Green: PoSENT140

Page 23: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Evaluating the classification ability

We used the TWT dataset,

Stemming,

And the three lexicons,

1. AFINN,

2. WRDNT,

3. NRC,

To decide which lexicon we should utilize.

AFINN results already discussed.

WRDNT contained more formal vocabulary. (Neutral s(tk) for 68.5% of tweets.)

NRC consists of a plethora of words-abbreviations and ‘internet slung’. More than 20 markers could be found in a tweet.

We decided to use AFINN.

Page 24: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Optimizing the hit rates for SMS

Page 25: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Developing a forensic tool to demonstrate the use of Mood Analysis for SMS and Instant Messengers

Page 26: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Open Source tools.

Android SDK

USB cable -> Developer Options -> USB Debugging -> On

From Platform Tools -> ADB

Get a root shell

Mount (to see file system info)

Use dd on the data partition and pull image to the computer

SMS on: /data/com.android.providers.telephony/databases/mmssms.db

from Android Devices

Data Extraction

Page 27: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

The Design Concept Using the Apache Lucene library for text pre-processing, indexing and searching.

Page 28: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

The MySQL database schema We keep the extracted keywords of each SMS in separate cells and from the stemmed words we calculate the mood score using AFINN, emoticons and valence.

Page 29: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Sentiment Timeline View (1)

Extracted from the list of all messages in the SQLite database.

Page 30: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Sentiment Timeline View (2)

Extracted from messages exchanged with one entity (left) or from messages sent by the person under investigation (right).

Page 31: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Searching the index (some advantages) • Faster search (here we looked for the word ‘happy’). • Friendlier output providing detailed information (id from original SQLite db, date, etc.). • Indexing and the specific methodology can be applied to all data in the phone with text format, e.g. emails.

Page 32: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Conclusions and Future Work

Page 33: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Outcome

Conclusions It is possible to extract feelings

from SMS using techniques applied to Twitter or micro-blogs.

Lexicon, Emoticons and word valence are important to the final outcome (s(tk)).

Timeline Sentiment View can stress regions of interest.

We can merge N.L.P. with Forensics to automate specific tasks.

Future Work Investigate the efficiency of

Support Vector Machines against our simplistic bag-of-words approach.

(Naïve Bayesian classification can exceed 75% of accuracy and SVMs may produce better results.)

Apply the concepts of Mood Analysis and Text Mining on the whole text material in a smartphone.

Page 34: Based Mood Analysis and Text-Mining Methods · Introduction Micro-blog Sentiment Analysis Twitter feeds vs. Short Message Service (SMS) Mood Score Calculation (Lexicon-based Approach)

Acknowledgement

This work has been supported by the European Union’s Prevention of and Fight against Crime Programme “Illegal Use of Internet” - ISEC 2010 Action Grants, grant ref. HOME/2010/ISEC/AG/INT-002 and the Systems Centre of the University of Bristol.

Thank you!


Recommended