+ All Categories
Home > Education > Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

Date post: 21-May-2015
Category:
Upload: knoesis-center-wright-state-university
View: 1,460 times
Download: 2 times
Share this document with a friend
Description:
H. Purohit, Y. Ruan, A. Joshi, S. Parthasarathy, A. Sheth. Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. in SoME 2011 (Workshop on Social Media Engagement, in conjunction with WWW 2011), March 29, 2011. Paper: http://knoesis.org/library/resource.php?id=1095 More on Social Media @ Kno.e.sis at http://knoesis.org/research/semweb/projects/socialmedia/
Popular Tags:
23
UNDERSTANDING USER-COMMUNITY ENGAGEMENT BY MULTI-FACETED FEATURES: A CASE STUDY ON TWITTER Hemant Purohit 1 , Yiye Ruan 2 , Amruta Joshi 2 , Srinivasan Parthasarthy 2 , Amit Sheth 1 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis ), Wright State University, USA 2 Dept. of Computer Science & Engineering Ohio State University, USA March 29, 2011 SoME 2011 (In Conjunction with WWW 2011)
Transcript
Page 1: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

UNDERSTANDING USER-COMMUNITY ENGAGEMENT BY MULTI-FACETED FEATURES:

A CASE STUDY ON TWITTER

Hemant Purohit1, Yiye Ruan2, Amruta Joshi2,

Srinivasan Parthasarthy2, Amit Sheth1

1Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, USA

2Dept. of Computer Science & EngineeringOhio State University, USA

March 29, 2011SoME 2011 (In Conjunction with WWW 2011)

Page 2: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

2

OUTLINE

(What) User-Community Engagement (Why) Motivation (How) Problem Formalization

Approach Terminology Definition

Analysis Framework People-Content-Network Analysis (PCNA)

Experiments Datasets and Event Categorization Features Results

Insights Conclusion & Future work

Page 3: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

3

USER-COMMUNITYENGAGEMENT

Multiple topics surrounding events being discussed on social media

Each topic constitutes a community of users discussing about it e.g., Japan Earthquake

community

How do we understand the phenomenon of user participation (engagement) in topic discussions

Image: http://itcilo.wordpress.com

Page 4: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

4

MOTIVATION

User Engagement Analysis

Business How communities form during the product launch? What factors can attract users to engage in these

communities, therefore further spreading the message?

Crisis Management Effective communication: How quickly we can disseminate

information between resource providers and people in need of resources?

Page 5: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

5

PROBLEM FORMALIZATION: Approach

User engagement has been studied in many forms: Community Formation & Detection, Information Propagation, Link

Prediction etc.

It involves a three-dimensional dynamic at play: Content: topic of interest, People: participants who engages in discussion about the topic, and Network: community structure formed around the topic discussion

Rather than limiting to one dimension, we propose multidimensional approach Case Study on Twitter

Page 6: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

6

EARLIER APPROACHES

Content: Topic of Interest

People: Participant of the discussion

Network: Community around topic

OR

OR

Images: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html

Page 7: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

7

OUR APPROACH

Content: Topic of Interest

People: Participant of the discussion

Network: Community around topic

AND

AND

Images: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html

Page 8: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

8

PROBLEM FORMALIZATION: Terminology

Event-Oriented Community An implicit group of social network users who have joined

discussion (by message posting) on topic about an event. Slice

Collection of messages relevant to topic of discussion, posted during a fixed-length time window.

Snapshot State of the network at a certain point of time at which user profile

and connection information are crawled. Active Window: freshness matters! Active Community

Page 9: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

9

PROBLEM FORMALIZATION: Definition

Binary classification problem for user-community link prediction.

User Engagement Prediction Problem: Given

1) an event-oriented community C formed around a topic of discussion;

2) a Twitter user U ε C, Predict whether U will be engaged in C (by composing a new

tweet or retweeting an existing tweet which contains keywords or hashtag related to C's underlying event) in a future slice. If so, U is said to be a positive record. Otherwise, it is a negative record.

Page 10: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

10

ANALYSIS FRAMEWORK:PEOPLE-CONTENT-NETWORK ANALYSIS (PCNA)

Page 11: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

11

EXPERIMENTS: Dataset & Event categorization

Study on Twitter data

Events have various characteristics and we hypothesize the user engagement analysis for them being affected by different variables

No standard event categorization is available, so we categorize the events observing data over a time, as follows:

Global (G) vs. Local (L) [e.g., Japan Earthquake vs. Iowa State Fair] Deterministic (D) vs. Unexpected (U) [e.g., Emmy Awards vs. Japan

Earthquake] Compact (C) vs. Loose (Ls) [e.g., ISWC conference vs. Japan

Earthquake] Transient (T) vs. Lasting (Lt) [e.g., President’s Speech vs. Egypt

Revolution]

Page 12: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

12

ClevelandShowPremiere: Second Season premiere of animated TV series Cleveland Show. September 26. Global, loose, deterministic, transient.

DiscoveryBuildingCrisis: Hostage crisis at the head- quarters of Discovery Channel, Maryland. September 1. Local, loose, unexpected, transient.

EmmyAwards: 62nd Prime-time Emmy Awards. August 29. Global, loose, deterministic, lasting.

GoogleInstantSearch: Launch of Google Instant in United States. September 8. Global, loose, unexpected, transient.

HeismanTrophy: Reggie Bush’s announcement to forfeit 2005 Heisman Trophy. September 14. Local, compact, unexpected, lasting.

IowaStateFair: Iowa State Fair. August 12-22. Local, loose, deterministic, lasting.

JewishNewYear: Jewish New Year 5771. September 8-10. Global, compact, deterministic, transient.

LindsayLohanHearing: LindsayLohan’s hearing on probation revocation and verdict. September 24. Local, loose, deterministic, transient.

LinuxCon: Annual convention organized by Linux Foundation. August 10-12. Global, compact, deterministic, lasting.

LondonTubeStrike: London tube strike. September 6. Local, loose, deterministic, transient.

RichCroninDeath: Death of singer and songwriter Rich Cronin. September 8. Local, loose, unexpected, transient.

ScottPilgrimRelease: Release of movie Scott Pilgrim vs. the World. Aug 13. Global, loose, deterministic, lasting.

SESSanFrancisco: Search Engine Strategies 2010 at San Francisco. August 16-20. Global, compact, deterministic, lasting.

StuxnetWorm: Confirmation of Stuxnet worm at- tack on Iranian nuclear program. September 24. Global, loose, unexpected, lasting.

EVENTS (LABELED AS PER OUR CATEGORIZATION)

Page 13: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

13

EXPERIMENTS: Features Organized in the PCNA framework: Node/Author features (P), Content

features (C), Community features (N)

Extracted for each potential community member (U) in each slice, where U belongs to the union of follower lists of each active community member

Whole Topic Community

Potential new Member (U)

Active Community

A BIf B follows A

EDGE:

Followee of (U)

Page 14: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

14

EXPERIMENTS: Features (cont.)

- Community features: [Characteristics of the active community/network under consideration]

wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.

wccPercent: ratio of wccSize to the size of the active network. connectivity: number of active friends (i.e. followees) in the

community. communitySize: size of the active community.

Author features [Characteristics of friends that U is following]: Only friends in the active community are considered. logFollower: logarithm of follower count logFollowee: logarithm of followee count Klout[1]: a integrated measure of user influence and popularity Other profile information and activity history[2].

[1] http://www.klout.com[2] Future works

Page 15: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

15

EXPERIMENTS: Features (cont.)

Content features [Characteristics of tweets posted by active friends of

U]: keywords: number of event-relevant keywords hashtags: number of event-relevant hashtags

retweet: number of retweets

mention: number of mentions

url: number of relevancy-adjust hyperlinks Irrelevant hyperlink is given number -1

subjectivity: Subjectivity scores for words and emoticons

Linguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extracted

1http://www.liwc.net

Page 16: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

16

WAIT A MINUTE!

Not all contents have been viewed! Novelty and Attention: User is likely to see new or

recent content/tweet and then join the community Apply temporal weighting on the features

Dataset imbalance: too many negative records! Alleviated by SMOTE method

Over-sampling on positive records and under-sampling on negative ones

Not all users are active! Apply weighting on activity level based on last activity[1]

[1] Future works

Page 17: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

17

EXPERIMENTS

We run the following experiment groups:

allFeatures (All): contains all three feature groups onlyContent (Con.): contains only content feature onlyAuthor (Aut.): contains only author feature onlyCommunity (Com.): contains only community

feature

SVM classifier LibSVM, RBF Kernel, gamma=8, c=32

Page 18: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

18

EXPERIMENTS: Results

Summary of Prediction Accuracy (%)Statistical significant results are in bold

Event-Type

Page 19: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

19

INSIGHTS

Performance of onlyCommunity classifiers is worst The latent nature of network features makes it difficult to be

perceived by a user directly.

The onlyContent classifiers give the best performance over other single feature groups Some users end up participating in a discussion based on

observing the information from the public timeline, and therefore, these ad-hoc users are hard to observe via network analysis only.

Content is engaging by its quality and nature (information sharing or call for an action or crowd sourcing). For example, link to an image or video (an evidential content) about Reggie Bush's surrender of Heisman Trophy in September, 2010 is likely to provoke lot more thoughts in a user's mind to engage in the discussion.

Page 20: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

20

INSIGHTS (Cont.)

Comparable performance of onlyAuthor classifiers as onlyContent classifiers for some of the topics Impact of the effective presence of influential people in the

discussion group Insufficiency in content features, reflected by low average

connectivity, can be compensated by author features (e.g., Rich Cronin Death).

Statistical significance testing method shows allFeatures classifiers have better or equivalent performance over any single feature group classifier for 12 out of 14 topics The advantage of using all features is dominant, where

degree of randomness in individual dimensions can be really high (e.g., Discovery Building attack).

Page 21: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

21

INSIGHTS (Cont.)

No significant correlation between selection of feature groups and the event types: lasting vs. transient. Possibility of the shift in the characteristics over time

Advantage of allFeatures over other factor groups is generally stronger on the unexpected topics than the deterministic ones. Degree of randomness being high in discussions

surrounding unexpected events

Page 22: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

22

CONCLUSION & FUTURE WORK

Every dimension (People, Content, Network) cannot be expected to perform well in all types of topic discussions, and hence, a strong need can be felt to study dynamics of user engagement by using the PCNA framework.

Experiments with a more refined event types taxonomy and user engagement factors, with consideration of shift in the event characteristics over time

Semantic Analysis of content to enhance content features

Experiment on other social networks: Forums, DBLP

Page 23: Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

23

QUESTIONS?

Paper at: http://knoesis.org/library/resource.php?id=1095

More on Social Media @ Kno.e.sis at http://knoesis.org/research/semweb/projects/socialmedia/


Recommended