Date post: | 22-Dec-2015 |
Category: |
Documents |
Upload: | charla-allen |
View: | 221 times |
Download: | 0 times |
Measurement and Analysis of Online Social Networks
Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee
Presented by Aleksandra Potapova
Focus
• graphs of online social networks– how they were obtained– how they were verified
• how measurement and analysis was performed
• properties of obtained graphs• why these properties are relevant
Why should we perform measurements and analysis in social networks?
• To design future online social network based systems
• To understand the impact of online social networks on the Internet
• To reduce the number of spam • To improve security aspect
5
Summary of graph properties
• small-world• power-law• scale-free• correlation between indegree and outdegree• large strongly connected core of high-degree
nodes surrounded by small clusters of low-degree nodes
Crawling Algorithms for large graphs
• BFS and DFS• Snowball method(crawling only small subset
of a graph by ending BFS early): – Partial BFS craws overestimate node degree and
underestimate the level of symmetry.– In social networks, they underestimate the power-
law coefficient, but closely match other metrics such as overall clustering coefficient.
How social networks should be crawled?
• The focus of the paper – WCC– Forward and reverse links should be used
How to Verify Samples
1. Obtain a random user sample– LJ: feature which returns 5,000 random users– Flickr: random 8-digit user id generation
2. Conduct a crawl using these random users as seeds3. See if these random nodes connect to the original
WCC4. See what the graph structure of the newly crawled
graph compares to original
12
Site YT Flickr LJ Orkut
Users(mill) 1.1 1.8 5.2 3
Links(mill) 4.9 22 72 223
symmetry 79.1% 62.0% 73.5% 100.0%
Access (FW: Forward-only)
(SS: HTML screen-scraping)
API
(users only)
FW
SS for group info
API
(users + groups)
FW
API
(users + groups)
FW + BW
SS for users + groups
13
Link Symmetry
• even with directed links, there is a high level of symmetry
• possibly contributed to by informing users of new incoming links
• makes it harder to identify reputable sources due to dilution
14
Power-law node degrees
• Orkut deviates:– only 11.3% of network reached (effect of partial
BFS crawl – Snowball method)– artificial cap of user’s number of outgoing links,
leads to a distortion in distribution of high degrees
• differs from Web
17
Correlation of indegree and outdegree
• over 50% of nodes have indegree within 20% of their outdegree
19
Link degree correlations
• JDD: joint degree distribution(how often nodes of different degree connect to each other)
• Knn --- mapping between outdegree and average indegree of all nodes connected to nodes of that outdegree– Used for aproxmation of JDD
• YouTube different due to extremely popular users being connected to by many unpopular users
• Orkut shows bump due to undersampling
Measurement and Analysis of Online Social Networks 20
Joint degree distribution and Scale-free behaviour
undersamplingof low-degreenodes celebrity-driven
nature
cap on links
Measurement and Analysis of Online Social Networks 21
Densely connected core• removing 10% of core nodes results in breaking up graph into millions of
very small SCCs• graphs below show results as nodes are removed starting with highest-
degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right)
Sub logarithmic growth
Measurement and Analysis of Online Social Networks 22
Tightly clustered fringe
• based on clustering coefficient• social network graphs show stronger
clustering, most likely due to mutual friends
Possibly because personal content is not shared
Measurement and Analysis of Online Social Networks 23
Groups
• group sizes follow power-law distribution• represent tightly clustered communities
Measurement and Analysis of Online Social Networks 24
Groups
• Orkut special case maybe because of partial crawl
Measurement and Analysis of Online Social Networks 25
Node Value Determination
Directed Graph, current model• nodes with many incoming links (hubs) have
value due to their connection to many users• it becomes easy to spread important information to
the other nodes, e.g. DNS• unhealthy in case of spam or viruses
• in order for a user to send spam, they have become a more important node, amass friends