Date post: | 21-May-2015 |
Category: |
Education |
Upload: | knoesis-center-wright-state-university |
View: | 1,460 times |
Download: | 2 times |
UNDERSTANDING USER-COMMUNITY ENGAGEMENT BY MULTI-FACETED FEATURES:
A CASE STUDY ON TWITTER
Hemant Purohit1, Yiye Ruan2, Amruta Joshi2,
Srinivasan Parthasarthy2, Amit Sheth1
1Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, USA
2Dept. of Computer Science & EngineeringOhio State University, USA
March 29, 2011SoME 2011 (In Conjunction with WWW 2011)
2
OUTLINE
(What) User-Community Engagement (Why) Motivation (How) Problem Formalization
Approach Terminology Definition
Analysis Framework People-Content-Network Analysis (PCNA)
Experiments Datasets and Event Categorization Features Results
Insights Conclusion & Future work
3
USER-COMMUNITYENGAGEMENT
Multiple topics surrounding events being discussed on social media
Each topic constitutes a community of users discussing about it e.g., Japan Earthquake
community
How do we understand the phenomenon of user participation (engagement) in topic discussions
Image: http://itcilo.wordpress.com
4
MOTIVATION
User Engagement Analysis
Business How communities form during the product launch? What factors can attract users to engage in these
communities, therefore further spreading the message?
Crisis Management Effective communication: How quickly we can disseminate
information between resource providers and people in need of resources?
5
PROBLEM FORMALIZATION: Approach
User engagement has been studied in many forms: Community Formation & Detection, Information Propagation, Link
Prediction etc.
It involves a three-dimensional dynamic at play: Content: topic of interest, People: participants who engages in discussion about the topic, and Network: community structure formed around the topic discussion
Rather than limiting to one dimension, we propose multidimensional approach Case Study on Twitter
6
EARLIER APPROACHES
Content: Topic of Interest
People: Participant of the discussion
Network: Community around topic
OR
OR
Images: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html
7
OUR APPROACH
Content: Topic of Interest
People: Participant of the discussion
Network: Community around topic
AND
AND
Images: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html
8
PROBLEM FORMALIZATION: Terminology
Event-Oriented Community An implicit group of social network users who have joined
discussion (by message posting) on topic about an event. Slice
Collection of messages relevant to topic of discussion, posted during a fixed-length time window.
Snapshot State of the network at a certain point of time at which user profile
and connection information are crawled. Active Window: freshness matters! Active Community
9
PROBLEM FORMALIZATION: Definition
Binary classification problem for user-community link prediction.
User Engagement Prediction Problem: Given
1) an event-oriented community C formed around a topic of discussion;
2) a Twitter user U ε C, Predict whether U will be engaged in C (by composing a new
tweet or retweeting an existing tweet which contains keywords or hashtag related to C's underlying event) in a future slice. If so, U is said to be a positive record. Otherwise, it is a negative record.
10
ANALYSIS FRAMEWORK:PEOPLE-CONTENT-NETWORK ANALYSIS (PCNA)
11
EXPERIMENTS: Dataset & Event categorization
Study on Twitter data
Events have various characteristics and we hypothesize the user engagement analysis for them being affected by different variables
No standard event categorization is available, so we categorize the events observing data over a time, as follows:
Global (G) vs. Local (L) [e.g., Japan Earthquake vs. Iowa State Fair] Deterministic (D) vs. Unexpected (U) [e.g., Emmy Awards vs. Japan
Earthquake] Compact (C) vs. Loose (Ls) [e.g., ISWC conference vs. Japan
Earthquake] Transient (T) vs. Lasting (Lt) [e.g., President’s Speech vs. Egypt
Revolution]
12
ClevelandShowPremiere: Second Season premiere of animated TV series Cleveland Show. September 26. Global, loose, deterministic, transient.
DiscoveryBuildingCrisis: Hostage crisis at the head- quarters of Discovery Channel, Maryland. September 1. Local, loose, unexpected, transient.
EmmyAwards: 62nd Prime-time Emmy Awards. August 29. Global, loose, deterministic, lasting.
GoogleInstantSearch: Launch of Google Instant in United States. September 8. Global, loose, unexpected, transient.
HeismanTrophy: Reggie Bush’s announcement to forfeit 2005 Heisman Trophy. September 14. Local, compact, unexpected, lasting.
IowaStateFair: Iowa State Fair. August 12-22. Local, loose, deterministic, lasting.
JewishNewYear: Jewish New Year 5771. September 8-10. Global, compact, deterministic, transient.
LindsayLohanHearing: LindsayLohan’s hearing on probation revocation and verdict. September 24. Local, loose, deterministic, transient.
LinuxCon: Annual convention organized by Linux Foundation. August 10-12. Global, compact, deterministic, lasting.
LondonTubeStrike: London tube strike. September 6. Local, loose, deterministic, transient.
RichCroninDeath: Death of singer and songwriter Rich Cronin. September 8. Local, loose, unexpected, transient.
ScottPilgrimRelease: Release of movie Scott Pilgrim vs. the World. Aug 13. Global, loose, deterministic, lasting.
SESSanFrancisco: Search Engine Strategies 2010 at San Francisco. August 16-20. Global, compact, deterministic, lasting.
StuxnetWorm: Confirmation of Stuxnet worm at- tack on Iranian nuclear program. September 24. Global, loose, unexpected, lasting.
EVENTS (LABELED AS PER OUR CATEGORIZATION)
13
EXPERIMENTS: Features Organized in the PCNA framework: Node/Author features (P), Content
features (C), Community features (N)
Extracted for each potential community member (U) in each slice, where U belongs to the union of follower lists of each active community member
Whole Topic Community
Potential new Member (U)
Active Community
A BIf B follows A
EDGE:
Followee of (U)
14
EXPERIMENTS: Features (cont.)
- Community features: [Characteristics of the active community/network under consideration]
wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.
wccPercent: ratio of wccSize to the size of the active network. connectivity: number of active friends (i.e. followees) in the
community. communitySize: size of the active community.
Author features [Characteristics of friends that U is following]: Only friends in the active community are considered. logFollower: logarithm of follower count logFollowee: logarithm of followee count Klout[1]: a integrated measure of user influence and popularity Other profile information and activity history[2].
[1] http://www.klout.com[2] Future works
15
EXPERIMENTS: Features (cont.)
Content features [Characteristics of tweets posted by active friends of
U]: keywords: number of event-relevant keywords hashtags: number of event-relevant hashtags
retweet: number of retweets
mention: number of mentions
url: number of relevancy-adjust hyperlinks Irrelevant hyperlink is given number -1
subjectivity: Subjectivity scores for words and emoticons
Linguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extracted
1http://www.liwc.net
16
WAIT A MINUTE!
Not all contents have been viewed! Novelty and Attention: User is likely to see new or
recent content/tweet and then join the community Apply temporal weighting on the features
Dataset imbalance: too many negative records! Alleviated by SMOTE method
Over-sampling on positive records and under-sampling on negative ones
Not all users are active! Apply weighting on activity level based on last activity[1]
[1] Future works
17
EXPERIMENTS
We run the following experiment groups:
allFeatures (All): contains all three feature groups onlyContent (Con.): contains only content feature onlyAuthor (Aut.): contains only author feature onlyCommunity (Com.): contains only community
feature
SVM classifier LibSVM, RBF Kernel, gamma=8, c=32
18
EXPERIMENTS: Results
Summary of Prediction Accuracy (%)Statistical significant results are in bold
Event-Type
19
INSIGHTS
Performance of onlyCommunity classifiers is worst The latent nature of network features makes it difficult to be
perceived by a user directly.
The onlyContent classifiers give the best performance over other single feature groups Some users end up participating in a discussion based on
observing the information from the public timeline, and therefore, these ad-hoc users are hard to observe via network analysis only.
Content is engaging by its quality and nature (information sharing or call for an action or crowd sourcing). For example, link to an image or video (an evidential content) about Reggie Bush's surrender of Heisman Trophy in September, 2010 is likely to provoke lot more thoughts in a user's mind to engage in the discussion.
20
INSIGHTS (Cont.)
Comparable performance of onlyAuthor classifiers as onlyContent classifiers for some of the topics Impact of the effective presence of influential people in the
discussion group Insufficiency in content features, reflected by low average
connectivity, can be compensated by author features (e.g., Rich Cronin Death).
Statistical significance testing method shows allFeatures classifiers have better or equivalent performance over any single feature group classifier for 12 out of 14 topics The advantage of using all features is dominant, where
degree of randomness in individual dimensions can be really high (e.g., Discovery Building attack).
21
INSIGHTS (Cont.)
No significant correlation between selection of feature groups and the event types: lasting vs. transient. Possibility of the shift in the characteristics over time
Advantage of allFeatures over other factor groups is generally stronger on the unexpected topics than the deterministic ones. Degree of randomness being high in discussions
surrounding unexpected events
22
CONCLUSION & FUTURE WORK
Every dimension (People, Content, Network) cannot be expected to perform well in all types of topic discussions, and hence, a strong need can be felt to study dynamics of user engagement by using the PCNA framework.
Experiments with a more refined event types taxonomy and user engagement factors, with consideration of shift in the event characteristics over time
Semantic Analysis of content to enhance content features
Experiment on other social networks: Forums, DBLP
23
QUESTIONS?
Paper at: http://knoesis.org/library/resource.php?id=1095
More on Social Media @ Kno.e.sis at http://knoesis.org/research/semweb/projects/socialmedia/