Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | letitia-wells |
View: | 214 times |
Download: | 0 times |
Data Mining and Machine Learning Lab
Network Denoising in Social Media
Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu
Data Mining and Machine Learning LabArizona State University
Network Denoising in Social Media
Social media extends the physical boundary of user relationship
Two-thirds of online adults use social network site. [1]
83% of teens are a member of at least one social network. [1]
Total time spent on social media has increased from 88 billion minutes to 121billion minutes. [2]
[2]. J. B. Maeve Duggan, “The demographics of social media users,” Pew Internet & American Life Project, 2012.
[1]. Nielsen, “State of the media: The social media report 2012,” 2012.
2
Network Denoising in Social Media
Average number of friends per user on Facebook [1]
[2]. By the number: 20 Amazing Twitter Stats: http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats
18-24-year-olds correspond to majority Facebook users, with average number of Facebook friends at 510.
[1]. http://www.marketingcharts.com/wp/direct/18-24-year-olds-on-facebook-boast-an-average-of-510-friends-28353/
[3]. R. Dunbar, How Many Friends Does One Person Need? Dunbar’s Number and Other Evolutionary Quirks. Faber and Faber,, 2010.
Average number of followers per user on Twitter: 208 [2]
Dunbar’s number [3] : 100-150 per user
3
Network Denoising in Social Media
4
Advantages of Weak TiesRecommender Systems (Job hunting)
Strong Ties Vs. Weak Ties
Challenges Brought by Weak Ties Difficulty in managing friends Introduce noise to certain tasks for studying
user behavior
Propose to de-noise an individual’s social networks for Behavior Inferring by removing noisy links
Motivation
Efficiently Infer user behavior in online social media , Community Detection, Viral Marketing
Better manage a user’s friendship Group Friends, Control Privacy, Defriend, etc.
Defined at user-level
Connections that are unimportant or result in worse performance to infer a user’s behavior
Applications (Benefits) of Identifying Noise (De-noising)
Behavior Inference
5
A connection between two users provides limited information indicating the tie strength.
Online information such as profile can be incomplete
Offline behaviors are usually unavailable in social media
Challenges of Identifying Noisy Links
6
Introduce social interactions for de-noising.
Like-minded users tend to have similar interactions. Higher degree of embeddedness suggests stronger tie strength
Homophily theory suggests that similarity breeds connections
Identify Noisy Links
Social Interactions
User
SimilarityTie Strength
Network
De-noising
7
Methodology
User Social Interactions
x x x
x
x x x
x x
x x x
User Feature Space
Neighborhood Feature Ni
User Feature fi
User SocialInteraction
(e.g., Tagging)
wi,j: the weight between ui and his friend j. w ≥ 0: weight vector (tie strength) between ui and his
social network
0.5
0
1
wi
U1 U2 U3 … Un
T1
T2
T3
…
Tm
U1
U5
U7
x x
x x
x
x x
T1
T2
T3
…
Tm
U1 U5 U7
x
x
Ui
n
iiiii
wwfwN
i 11
2
20)(min
8
Problem Statement
Integrate multiple types of social interactions for denoising social networks
U1 U2 U3 … Un
U1 x x
U2 x
U3 x
… x
Un x
U1 U2 U3 … Un
T1 x x
T2 x
T3 x x x
… x x
Tm x x x
U1 U2 U3 … Un
U1 x x
U2 x x
U3 x
… x x x
Un x x
Tagging
Commenting
LinkingForming connections with other users
user-user adjacent matrix
Subscribing to tags user-tag subscribing matrix
Making comments to other users user-user commenting matrix
9
Problem Statement
U1 U2 U3 … Un
U1 x x
U2 x
U3 x
… x
Un x
U1 U2 U3 … Un
T1 x x
T2 x
T3 x x x
… x x
Tm x x x
U1 U2 U3 … Un
U1 x x
U2 x x
U3 x
… x x x
Un x x
Tagging
Commenting
Linking
10
Problem Statement
U1 U2 U3 … Un
U1 x x
U2 X
U3 x
… x
Un x
U1 U2 U3 … Un
T1 x x
T2 x
T3 x x x
… x x
Tm x x x
U1 U2 U3 … Un
U1 x x
U2 x x
U3 x
… x x x
Un x x
Commenting Linking Tagging
n
iiiii
wwfwN
i 11
2
2
1121
0)(min
n
iiiii
wwfwN
i 11
2
2
2222
0)(min
n
iiiii
wwfwN
i 11
2
2
3323
0)(min
n
iiiiiiiiiii
wwfwNfwNfwN
i 11
2
2
3323
2
2
2222
2
2
1121
0)(min
11
Problem Statement
Commenting Linking Tagging
n
iiiiiiiiiii
wwfwNfwNfwN
i 11
2
2
3323
2
2
2222
2
2
1121
0)(min
n
ii
T
i
i
i
i
i
i
i
wwe
f
f
f
w
N
N
N
i 1
2
2
33
22
11
33
22
11
0min
n
ii
Tiii
wweFw
i 1
2
2
21
0)(min
TTlil
T
i
T
ii NNN ),,,( 22
11
TTlil
T
i
T
ii fffF ),,,( 22
11
U1 U2 U3 … Un
U1 x x
U2 X
U3 x
… x
Un x
U1 U2 U3 … Un
T1 x x
T2 x
T3 x x x
… x x
Tm x x x
U1 U2 U3 … Un
U1 x x
U2 x x
U3 x
… x x x
Un x xUi
X
X
XUi
x
x
x
Ui
x
x
12
Datasets BlogCatalog; Flickr; BlogMI
Evaluation ApproachComparing the performance of behavioral inference on the network before and after de-noising
Experiments
13
Evaluation Approach Labels representing user behavior Category Information on BlogCatalog User Groups information on Flickr
Behavior Inference Construct two user friendship networks:
Denoising VS Non-Denoising Extract social dimensions for each user on the two networks Apply supervised learning approach to infer user behavior.
Experiments
14
= F-Measure
Whether de-noising can maintain or improve the performance.
Whether de-noising can make the social media networks more compact.
Whether de-noising can accomplish the same behavior inference task faster
Evaluation ApproachComparing the performance of behavioral inference before and after de-noising
Experiments
15
Denoising VS No Denoising
Experiments
16
Experiments
Denoising Performance with λ
n
ii
Tiii
wweFw
i 1
2
2
21
0)(min
17
Experiments
Denoising with Multiple Interactions
18
Experiments
Link Reduction Analysis
Flickr users share more similarities with their social networks. On average, a Flickr user has 178 tags,
BlogCatalog and BlogMI users only have 8 tags.
On average, a Flickr user shares 24.43 tags with his neighbor, BlogCatalog and BlogMI users only share 0.27 and 0.24 tags.
Flickr users have fewer friends comparedto the users on the other two datasets.
19
Experiments
Advantages of Denoising Social Networks
20
Limitations on Denoising Task Oriented “noisy user” is considered as a user who holds irrelevant or opposite
interests to the target user. provides a purer environment better presenting homophily effect
regarding to behavioral inference task.
Loss of negative information Be analog to a user-user network with trust information (positive
information) only while without the observation of distrust information (negative information).
Denoising may reduce the performance of tasks like item recommendation
Discussion
21
Propose an efficient approach to de-noise social networks for behavior inference by utilizing multiple types of interactions in social media.
The advantages of de-noising social networks are verified via the performance of behavioral inference, link reduction, and time efficiency.
Network de-noising not only improves performance but also make social networks more compact for efficient social media mining.
Conclusion
22
Integrating all Vs. selectively integrating
Linking + Tagging Vs. Linking + Tagging + Commenting
Interactions cross social mediaFacebook: LikingTwitter: Re-tweetingLinkedin: ConnectingFoursquare: Check-in
Future Work
23
Co-authors and members in DMML@ASU
ONR (Office of Naval Research)
Acknowledgments
24
Questions
25