+ All Categories
Home > Data & Analytics > Contextualized versus Structural Overlapping Communities in Social Media.

Contextualized versus Structural Overlapping Communities in Social Media.

Date post: 16-Apr-2017
Category:
Upload: mohsen-shahriari
View: 2,228 times
Download: 0 times
Share this document with a friend
31
Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. 1 Learni ng Layers Contextuali zed versus Structural Overlapping Community Structures in Social Media Mohsen Shahriari Ying Li Ralf Klamma This slide deck is licensed under a Creative Commons Attribution- ShareAlike 3.0 Unported License . Contextualized versus Structural Overlapping Communities in Social Media Mohsen Shahriari, Sabrina Haefele, Ralf Klamma Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany {shahriari, haefele, klamma}@db is.rwth-aachen.de Chair of Computer Science 5 RWTH Aachen University
Transcript
Page 1: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke1

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Contextualized versus Structural Overlapping Communities in Social Media

Mohsen Shahriari, Sabrina Haefele, Ralf KlammaAdvanced Community Information Systems (ACIS)

RWTH Aachen University, Germany{shahriari, haefele, klamma}@dbis.rwth-aachen.de

Chair of Computer Science 5RWTH Aachen University

Page 2: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke2

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Outline Research background

– Necessity of community analysis– Community detection

Literature & Challenges Research questions Baselines & Proposed Methods Dataset & Metrics Results Conclusion & Future Works

Page 3: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke3

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to Characterize Networks

Power law – Eligible for social network analysis – Presence of hubs

Small-World-ness Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif structures

Community structure– Overlapping community structure– But also to support other applications– Scale up information

Page 4: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke4

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to Characterize Networks

Power law – Eligible for social network analysis – Presence of hubs

Small-World-ness Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif structures

Community structure– Overlapping community structure– But also to support other applications– Scale up information

Degree Distribution of the CiteULike user-tag networkSource: Taken from networkscience.wordpress.com

Page 5: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke5

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to Characterize Networks

Power law – Eligible for social network analysis – Presence of hubs

Small-World-ness Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif structures

Community structure– Overlapping community structure– But also to support other applications– Scale up information

Source: Milgram experiment “The small world problem”

Page 6: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke6

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to Characterize Networks

Power law – Eligible for social network analysis – Presence of hubs

Small-World-ness Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif structures

Community structure– Overlapping community structure– But also to support other applications– Scale up information

Source: Taken from networkscience.wordpress.com

Page 7: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke7

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: How to Characterize Networks

Power law – Eligible for social network analysis – Presence of hubs

Small-World-ness Motifs

– Synchronizability, cooperativity, stability and robustness may depend on motif structures

Community structure– Overlapping community structure– But also to support other applications– Scale up information

Page 8: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke8

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What Is A (overlapping) Community?

Components have high density inside communities and sparse among clusters

People with similar interests or needs (Preece, 2000)

Recent research: OverlappingStructures are dense (Jaewon Yang & Leskovec, 2012)

(Girvan & Newman, Mark E. J., 2002)

Page 9: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke9

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What Is A (overlapping) Community?

In some networks even other definitions Signed social networks: density and balancing theory

(Doreian, 2004)

Different interpretation of communities and their definitions

--

+

+ ++

+

++

+

+

++

+

+

+

+

Page 10: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke10

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Background: What is A (overlapping) Community?

Communities may be formed when people have some ideas, innovation and thoughts to discuss– When they do not know each other

Page 11: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke11

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

LiteratureLiterature

Page 12: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke12

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Challenges regarding Content-based OCD

Imperceptible knowledge regarding significance of content – Community events e.g., releases in open source developer network– Correlation of content and structural properties of the social media

Few of them detect overlapping community structures– Detecting only disjoint community structures

Most of the methods are not suitable for thread-based data structures– Needs huge tuning

Most of the approaches do not work on actual posts/contents– Use mainly attributes/tags

Page 13: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke13

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Research Questions How structural properties like number of overlapping

nodes, modularity and average community size are affected by contextualized similarities among users in question & answer social platforms?

Can adding of content improve the performance of structural based algorithms?

Page 14: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke14

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Structural/Content-Based OCD Approaches

First we introduce the baselines used in this work– Disassortative degree Mixing and Information Diffusion (DMID)– Speaker-listener Label Propagation Algorithm (SLPA)– Stanoev, Smikov and Kocarev (SSK)– Algorithm by Li, Zhang, Liu, Chen and Zhang (CLIZZ)

Then we introduce the proposed Content-based methods– Cost function optimization clustering algorithm (CFOCA)– Term community merging algorithm (TCMA)– Combining content and structural values

Page 15: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke15

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Disassortative Degree Mixing and Information Diffusion (DMID)

Detecting most influential nodes (leaders)– Using of disassortative degree mixing property

–– Row normalize disassortative matrix

– Performing a random walk

– Computing local leadership value– Combining degree and disassortative value

Cascading behavior named network coordination game

Page 16: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke16

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Speaker-listener Label Propagation Algorithm (SLPA)

Extension of label propagation algorithm– Nodes can take multiple labels

Idea: speaker-listener information propagation process (mimics human communication)

Nodes can store updated labels Steps:

1. Node’s memory is initialized with unique label2. Do until a user defined iteration number is reached:

1. Select one node as listener2. Each neighbor randomly selects a label3. Listener accepts one of the propagated labels according to a rule (e.g.,

most popular label)

3. Post-processing phase for identifying the communities

Page 17: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke17

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: Stanoev, Smikov and Kocarev (SSK)

An algorithm based on influence dynamics and membership computation– Relationships of nodes and their influences are more important than direct

connections– Proxies among nodes are better established while there exits triangles among

nodes Computing transitive link matrix using both adjacency matrix and

triangle occurrences

Computing the membership of nodes to leaders– Weighted average membership of neighbors

Page 18: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke18

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Baseline Methods: CLIZZ Two phase algorithm

– Identifying influential nodes based on influence range– Influence ranges are computed based on shortest

distance

– Computing membership values of nodes using and updating rule

Page 19: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke19

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Proposed Content-Based Methods: Feature Creation Phase

Term Matrix– Constructed from threads of the user– Converted by tf-idf

Threads

tf-idf

Threads

Threads

w1 w2 w3 …

0.23 0.5 0

0.8 0 1

0 1.2 0.59

w1

w3

w2Term Matrix

Page 20: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke20

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Minimization of the costs Cost function J based on cosine similarity

Updating the centroids using gradient descent

Modification for overlapping communities: threshold for distance to other centroids

Cost Function Optimization Clustering Algorithm (CFOCA)

Page 21: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke21

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Term Community Merging Algorithm (TCMA)

Two phases– Compute one community per each word– Refinement of the communities using overlapping

coefficient

w1 w2 w3 …

0.23 0.5 0

0.8 0.76 1

0 1.2 0.59

Term Matrix

Page 22: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke22

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Content-Based Weighting Method Generate two weights from content

Use OCD algorithms to compute communities, like DMID, SSK and CLiZZ

Threads

( r , s )

w1 w2 w3 …

0.23 0.5 0 …

0.8 0 1 …

Term Matrix

Page 23: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke23

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Datasets and Metrics Jmol dataset

– Forum discussion regarding a Java-Tool for molecular modeling of chemical structures

– Open source development– 2002 – 2012– Publicly available at

– https://github.com/rwth-acis/REST-OCDServices/wiki/Jmol-Dataset

Combined modularity– Considering both content and density

Number of overlapping nodes, average community sizes to extract useful information

Page 24: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke24

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Average Community Size

1, 10 and 11 have low content similarity 6 has the highest content similarity

Community has the highest size

Page 25: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke25

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Number of Overlapping Nodes

Releases 2, 3, 4 and 5 have high similarity and low overlapping nodes

Similarity costs are global measures

Page 26: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke26

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Similarity Costs versus Modularity Reverse relation between content similarity and modularity

Page 27: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke27

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Average Community Size versus Releases

Content-based algorithms are useful when structure of the network is missing

Content-based algorithms detect bigger community sizes

Page 28: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke28

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Number of Overlapping Nodes versus Releases

Content-based methods may reflect the actual changes Content-based methods detect higher overlaps in

comparison to structural-based methods

Page 29: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke29

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

Conclusion & Future Works Conclusion & Message:

Content has significant effect on structural-based techniques– Changing in community sizes, number of overlapping nodes and modularity– Content-based methods detect bigger community sizes with bigger overlaps

Future Works:

Investigate local similarity costs Improving time complexity

Page 30: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke30

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma

References Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks, Nature, 466(7307), 761–

764. doi:10.1038/nature09182 Derényi, I., Palla, G., & Vicsek, T. (2005). Clique Percolation in Random Networks. Physical Review Letters, 94(16), 160202.

doi:10.1103/PhysRevLett.94.160202 Ding, Z., Zhang, X., Sun, D., & Luo, B. (2016). Overlapping Community Detection based on Network Decomposition. Sci Rep,

6(24115). doi:10.1038/srep24115 Doreian, P. (2004). Evolution of Human Signed Networks, 1(2), 277–293. Retrieved from http://snap.stanford.edu/class/cs224w-

readings/dorean04evolution.pdf Girvan, M., & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National

Academy of Sciences, 99(12), 7821–7826. doi:10.1073/pnas.122653799 Gunnemann, S., Boden, B., Farber, I., & Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs

with Feature Vectors. In Advances in Knowledge Discovery and Data Mining (pp. 261–275). Springer Berlin Heidelberg. Gunnemann, S., Farber, I., Boden, B., & Seidl, T. (2010). subspace clustering meets dense subgraph mining; a synthesis of two

paradigms. In The 10th International Conference On Data Mining . Havemann, F., Heinz, M., Struck, A., & Gläser, J. (2011). Identification of overlapping communities and their hierarchy by locally

calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment. doi:10.1088/1742-5468/2011/01/P01023

Preece, J. (2002). Supporting Community and Building Social Capital - Guest Editorial. Communications of the ACM, 45(4), 37 39.‐ Shahriari, M., Parekodi, S., & Klamma, R. (2015). Community-aware Ranking Algorithms for Expert Identification in Question-

answer Forums. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business. I-KNOW (pp. 1–8). ACM. Retrieved from http://doi.acm.org/10.1145/2809563.2809592

Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 388(8), 1706–1712. doi:10.1016/j.physa.2008.12.021

Yang, J., & Leskovec, J. (2012). Structure and Overlaps of Communities in Networks. CoRR, abs/1205.6228.

Page 31: Contextualized versus Structural Overlapping Communities in Social Media.

Lehrstuhl Informatik 5(Information Systems)

Prof. Dr. M. Jarke31

LearningLayers

Contextualized versus Structural Overlapping Community Structures in Social Media

Mohsen ShahriariYing LiRalf Klamma


Recommended