+ All Categories
Home > Documents > Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang,...

Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang,...

Date post: 29-Dec-2015
Category:
Upload: letitia-wells
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning Lab Arizona State University
Transcript
Page 1: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Data Mining and Machine Learning Lab

Network Denoising in Social Media

Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu

Data Mining and Machine Learning LabArizona State University

Page 2: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Network Denoising in Social Media

Social media extends the physical boundary of user relationship

Two-thirds of online adults use social network site. [1]

83% of teens are a member of at least one social network. [1]

Total time spent on social media has increased from 88 billion minutes to 121billion minutes. [2]

[2]. J. B. Maeve Duggan, “The demographics of social media users,” Pew Internet & American Life Project, 2012.

[1]. Nielsen, “State of the media: The social media report 2012,” 2012.

2

Page 3: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Network Denoising in Social Media

Average number of friends per user on Facebook [1]

[2]. By the number: 20 Amazing Twitter Stats: http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats

18-24-year-olds correspond to majority Facebook users, with average number of Facebook friends at 510.

[1]. http://www.marketingcharts.com/wp/direct/18-24-year-olds-on-facebook-boast-an-average-of-510-friends-28353/

[3]. R. Dunbar, How Many Friends Does One Person Need? Dunbar’s Number and Other Evolutionary Quirks. Faber and Faber,, 2010.

Average number of followers per user on Twitter: 208 [2]

Dunbar’s number [3] : 100-150 per user

3

Page 4: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Network Denoising in Social Media

4

Advantages of Weak TiesRecommender Systems (Job hunting)

Strong Ties Vs. Weak Ties

Challenges Brought by Weak Ties Difficulty in managing friends Introduce noise to certain tasks for studying

user behavior

Page 5: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Propose to de-noise an individual’s social networks for Behavior Inferring by removing noisy links

Motivation

Efficiently Infer user behavior in online social media , Community Detection, Viral Marketing

Better manage a user’s friendship Group Friends, Control Privacy, Defriend, etc.

Defined at user-level

Connections that are unimportant or result in worse performance to infer a user’s behavior

Applications (Benefits) of Identifying Noise (De-noising)

Behavior Inference

5

Page 6: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

A connection between two users provides limited information indicating the tie strength.

Online information such as profile can be incomplete

Offline behaviors are usually unavailable in social media

Challenges of Identifying Noisy Links

6

Page 7: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Introduce social interactions for de-noising.

Like-minded users tend to have similar interactions. Higher degree of embeddedness suggests stronger tie strength

Homophily theory suggests that similarity breeds connections

Identify Noisy Links

Social Interactions

User

SimilarityTie Strength

Network

De-noising

7

Page 8: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Methodology

User Social Interactions

x x x

x

x x x

x x

x x x

User Feature Space

Neighborhood Feature Ni

User Feature fi

User SocialInteraction

(e.g., Tagging)

wi,j: the weight between ui and his friend j. w ≥ 0: weight vector (tie strength) between ui and his

social network

0.5

0

1

wi

U1 U2 U3 … Un

T1

T2

T3

Tm

U1

U5

U7

x x

x x

x

x x

T1

T2

T3

Tm

U1 U5 U7

x

x

Ui

n

iiiii

wwfwN

i 11

2

20)(min

8

Page 9: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Problem Statement

Integrate multiple types of social interactions for denoising social networks

U1 U2 U3 … Un

U1 x x

U2 x

U3 x

… x

Un x

U1 U2 U3 … Un

T1 x x

T2 x

T3 x x x

… x x

Tm x x x

U1 U2 U3 … Un

U1 x x

U2 x x

U3 x

… x x x

Un x x

Tagging

Commenting

LinkingForming connections with other users

user-user adjacent matrix

Subscribing to tags user-tag subscribing matrix

Making comments to other users user-user commenting matrix

9

Page 10: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Problem Statement

U1 U2 U3 … Un

U1 x x

U2 x

U3 x

… x

Un x

U1 U2 U3 … Un

T1 x x

T2 x

T3 x x x

… x x

Tm x x x

U1 U2 U3 … Un

U1 x x

U2 x x

U3 x

… x x x

Un x x

Tagging

Commenting

Linking

10

Page 11: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Problem Statement

U1 U2 U3 … Un

U1 x x

U2 X

U3 x

… x

Un x

U1 U2 U3 … Un

T1 x x

T2 x

T3 x x x

… x x

Tm x x x

U1 U2 U3 … Un

U1 x x

U2 x x

U3 x

… x x x

Un x x

Commenting Linking Tagging

n

iiiii

wwfwN

i 11

2

2

1121

0)(min

n

iiiii

wwfwN

i 11

2

2

2222

0)(min

n

iiiii

wwfwN

i 11

2

2

3323

0)(min

n

iiiiiiiiiii

wwfwNfwNfwN

i 11

2

2

3323

2

2

2222

2

2

1121

0)(min

11

Page 12: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Problem Statement

Commenting Linking Tagging

n

iiiiiiiiiii

wwfwNfwNfwN

i 11

2

2

3323

2

2

2222

2

2

1121

0)(min

n

ii

T

i

i

i

i

i

i

i

wwe

f

f

f

w

N

N

N

i 1

2

2

33

22

11

33

22

11

0min

n

ii

Tiii

wweFw

i 1

2

2

21

0)(min

TTlil

T

i

T

ii NNN ),,,( 22

11

TTlil

T

i

T

ii fffF ),,,( 22

11

U1 U2 U3 … Un

U1 x x

U2 X

U3 x

… x

Un x

U1 U2 U3 … Un

T1 x x

T2 x

T3 x x x

… x x

Tm x x x

U1 U2 U3 … Un

U1 x x

U2 x x

U3 x

… x x x

Un x xUi

X

X

XUi

x

x

x

Ui

x

x

12

Page 13: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Datasets BlogCatalog; Flickr; BlogMI

Evaluation ApproachComparing the performance of behavioral inference on the network before and after de-noising

Experiments

13

Page 14: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Evaluation Approach Labels representing user behavior Category Information on BlogCatalog User Groups information on Flickr

Behavior Inference Construct two user friendship networks:

Denoising VS Non-Denoising Extract social dimensions for each user on the two networks Apply supervised learning approach to infer user behavior.

Experiments

14

= F-Measure

Page 15: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Whether de-noising can maintain or improve the performance.

Whether de-noising can make the social media networks more compact.

Whether de-noising can accomplish the same behavior inference task faster

Evaluation ApproachComparing the performance of behavioral inference before and after de-noising

Experiments

15

Page 16: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Denoising VS No Denoising

Experiments

16

Page 17: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Experiments

Denoising Performance with λ

n

ii

Tiii

wweFw

i 1

2

2

21

0)(min

17

Page 18: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Experiments

Denoising with Multiple Interactions

18

Page 19: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Experiments

Link Reduction Analysis

Flickr users share more similarities with their social networks. On average, a Flickr user has 178 tags,

BlogCatalog and BlogMI users only have 8 tags.

On average, a Flickr user shares 24.43 tags with his neighbor, BlogCatalog and BlogMI users only share 0.27 and 0.24 tags.

Flickr users have fewer friends comparedto the users on the other two datasets.

19

Page 20: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Experiments

Advantages of Denoising Social Networks

20

Page 21: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Limitations on Denoising Task Oriented “noisy user” is considered as a user who holds irrelevant or opposite

interests to the target user. provides a purer environment better presenting homophily effect

regarding to behavioral inference task.

Loss of negative information Be analog to a user-user network with trust information (positive

information) only while without the observation of distrust information (negative information).

Denoising may reduce the performance of tasks like item recommendation

Discussion

21

Page 22: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Propose an efficient approach to de-noise social networks for behavior inference by utilizing multiple types of interactions in social media.

The advantages of de-noising social networks are verified via the performance of behavioral inference, link reduction, and time efficiency.

Network de-noising not only improves performance but also make social networks more compact for efficient social media mining.

Conclusion

22

Page 23: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Integrating all Vs. selectively integrating

Linking + Tagging Vs. Linking + Tagging + Commenting

Interactions cross social mediaFacebook: LikingTwitter: Re-tweetingLinkedin: ConnectingFoursquare: Check-in

Future Work

23

Page 24: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Co-authors and members in DMML@ASU

ONR (Office of Naval Research)

Acknowledgments

24

Page 25: Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.

Questions

25


Recommended