Date post: | 14-Jul-2015 |
Category: |
Education |
Upload: | kritikansalk |
View: | 114 times |
Download: | 1 times |
Articles-Interestingness based on
Twitter-Sentiments
Group - 7
Kriti Kansal - 201101012
Arpit Bhayani - 201305515
Anirudh Beria - 201001104
Rishabh Gupta - 201307676
Why Interesting Article ?
Explosive growth of online articles.
Higher e-commerce value will result fromo Number of hits (Viewership)
o Relevance
o Interestingness
Twitter : The Microblogging Giant
Popular Social Networking Platform.
A precise and concise source of current affairs.
A preferred platform for expressing sentiments.
Approach
Flavour of Articleo Named-Entity Recognition
o Prominent Entities
Live Twitter Streamo Sentiments
o Trends
Work Flow
Named Entity Recognition
Identify Pure Nounso A word which is never been used as any POS
o A word does not exist in English Language
Identify Pure Noun Phraseo Proximity of word with Pure Nouns
Classifying Named Entitieso Use of Hypernym graphs, WordNet
o Assign classes like Person, organization etc. to entities
Sentiment Analysis
May refer to affective state, intended emotional
communication or judgement of a speaker
Determines the attitude of a speaker with respect to
some topic
Helps in identifying Overall contextual polarity of a
document
Live Twitter Stream used for collecting tweets
corresponding to various named entities for the
sentiment calculation
Sentiment Scores
Open twitter stream to get live tweets for each entity derived for an article
Preprocess Tweets:
o Tweet words cleaning
o Elimination of Stop Words
o Spell correction
Classify into Positive and Negative Tweets
○ If num(Positive_Words>Negative_Words)
then class(Tweet)=Pos_Tweet
○ else
class(Tweet)=Neg_Tweet
For each named entity calculate its sentiment score as:
o Score(Entity)=(num(Pos_Tweet)-num(Neg_Tweet))/Num(Total_Entity_Tweets)
Sentiment Scores …
Interestingness Scores
We come up with an interestingness scores used for ranking of the articles
based on each entity’s sentiment scores belonging to that article as:
I1 = ( ∑ | Score(Entity)) / Total_Entites
o Incorporates Sentiment Of Entity
I2 = I1 + factor * min(num(Pos_Entities), num(Neg_entities) ) /
Total_Article_Tweets
o Higher weight for contrasting Entities as they increase interestingness
I3 = I2* Total_Article_TweetsGreater number of live tweets make article more trending
Final Ranking of the articles is based on the interestingness score I3 as
calculated above
Testing Results
BBC news website dataset was used.
It has 2225 Documents with 9636 entities.
For Named Entity Recognition 88% precision and 81% recall obtained using CoreNLP (Stanford NLP library) as standard.
For Sentiment Analysis 65% of the tweets were classified with right sentiments when manually evaluated.
We do final interestingness scores evaluation based on F-Measure.
F-Measure scores based on manual interestingness classification for a testing data of 100 documents achieved was 0.38.
Future Work
• Batch Processing of Tweets along with Background Live Feed as opposed to only Live Twitter Feeds being used for sentiment analysis currently
• Interestingness is after all subjective, thus interestingness measures taking into account the users preference above the objective interestingness scores is aimed towards
References
• Opinion mining, Sentiment Analysis, and Opinion Spam Detection – Vasudeva Varma
• iScore: Measuring the Interestingness of Articles in a Limited User Environment
• Interestingness Measures for Data Mining: A Survey
• A Survey of Interestingness Measures for Knowledge Discovery
• https://semantria.com/features/entityextraction
• http://en.wikipedia.org/wiki/Namedentity_recognition
• http://en.wikipedia.org/wiki/Sentiment_analysis
Thank You