+ All Categories
Home > Documents > USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept....

USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept....

Date post: 18-Mar-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
21
USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING Matthias Steinbauer, Gabriele Anderst-Kotsis Institute of Telecooperation
Transcript
Page 1: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR LARGE-SCALE TEMPORAL GRAPH PROCESSING

Matthias Steinbauer, Gabriele Anderst-Kotsis

Institute of Telecooperation

Page 2: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

TALK OUTLINE

⬛ Introduction and Motivation ⬛ Preliminaries on temporal graphs ⬛ DynamoGraph a platform for large-scale temporal graph

processing ⬛ How users visit the web ⬛ Real-world Social Networks ⬛ The Global Social Learning Network ⬛ Conclusions

Page 3: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

INTRODUCTION / MOTIVATION

⬛ Graphs serve as models for real world structures in many different disciplines ⬛ Social sciences > Social networks ⬛ Biology > Protein-Protein Interactions ⬛ Cartography > Digital road maps ⬛ Web > The web graph

⬛ Graph and network models are very well studied in mathematics and related disciplines, and computer science

⬛ So why is there need for new research in this area?

Page 4: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

REAL WORLD GRAPHS OFTEN GROW TO LARGE SCALES

It is not feasible to process large graphs with traditional

tools and algorithms

require new tactics for

visualisation

Exceed memory size of single

computer

Page 5: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

THE DIMENSION OF TIME CANNOT BE NEGLECTED

Static reachability measures in social networks

do not hold for dynamic networks

Biological processes are time dependentStatic views on

graphs show often show blurred or too dense

data

Page 6: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

TEMPORAL GRAPHS

F. Harary and G. Gupta. Dynamic Graph Models. Mathl. Comput. Modelling, 25(7):79–87, 1997.

Graph G is a pair (V, E)where V denotes the set of vertices and E denotes the set of edges between any v, e ∈ V

A temporal graph T can be given as a set of graphs T = {G1, G2, G3, …, Gt} where each Gx = (Vx, Ex) Gx is called a static snapshot at time x And Gtm..tn as a selection of multiple Gx from T is a static snapshot for a timespan

Page 7: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

VERTEX CENTRIC EMBODIMENT

{id: 39827736,name: 'Rob Henderson',description: '',inEdges: [ {

weight: 7.3,edgeType: 'PHONE',source: 39761932,target: 39827736, } ],

outEdges: [ {weight: 10.0,edgeType: 'EMAIL',source: 39827736,target: 39761932, } ],

}

Page 8: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

VERTEX CENTRIC EMBODIMENT WITH TEMPORAL DOCUMENT

{id: 39827736,resolution: 'MONTHS','1420070400': {

name: 'Rob Henderson',description: '',inEdges: [ {

weight: 3.3,edgeType: 'PHONE',source: 39761932,target: 39827736, } ],

outEdges: [ {weight: 4.0,edgeType: 'EMAIL',source: 39827736,target: 39761932, } ],

},'1422748800': {

inEdges: [ {weight: 4.0,edgeType: 'PHONE',source: 39761932,

Page 9: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

HORIZONTAL SCALABILITY

host1 host2 host35 2

Page 10: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

PREGEL / COMPUTE AGGREGATE BROADCAST

G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In ACM SIGMOD International Conference on Management of Data, 2010.

Master

Worker 1

Worker 2

Worker 3

Worker 4

Initia

lisation

Local C

om

puta

tion

Me

ssage R

outing

Step 1

Syncro

nis

ation

Local C

om

puta

tion

Step 2M

essage R

outing

Syncro

nis

ation

Local C

om

puta

tion

Step 3

Me

ssage R

outing

Ha

ltin

g

Page 11: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

ZooKeeper

PrivateCommunication

Network

Host 0 (Master) Host 1 (Worker)Client

PartitionManager

SuperStepManager

CodeManager

Client API

MasterProcess

PublicCommunication

Network

ClientApp WorkerProcess / Slots

CassandraNode

Partition

StepExecutor

MessageQueue

WorkerProcess / Slots

CassandraNode

Partition

StepExecutor

MessageQueue

Host n (Worker)

DYNAMOGRAPH ARCHITECTURE

Page 12: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

HOW USERS VISIT THE WEB

⬛ Center for Complex Networks and Systems Research at Indiana University Bloomington collected the Click Dataset

⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ⬛ Available as a ~2.8 TB compressed (~12 TB uncompressed)

⬛ Imported to DynamoGraph in monthly resolution ⬛ retained only human generated traffic ⬛ created vertices based on domain names (3.4 million after cleansing)

http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/

Page 13: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

HOW USERS VISIT THE WEB

⬛ Click temporal graph was successfully loaded into DynamoGraph ⬛ 10 compute nodes with 8 threads (graph partitions) each ⬛ On top of an OpenStack cloud

⬛ Currently experiments with distributed PageRank are conducted ⬛ PageRank is computed in sliding windows of 6 to 3 months ⬛ It is expected that popularity trends in web-sites, ad-networks, and

social networks are visible in the data

⬛ First look into the data clearly shows the drop of popularity for the online social network MySpace.com

Page 14: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

REAL WORLD SOCIAL NETWORKS

⬛ Users interacting with each other (online or in real-life) form a social network

⬛ Much research from diverse disciplines (sociology, mathematics, computer science, reality mining, …) was already conducted in this area

⬛ Social networks can be modelled as temporal graphs

⬛ Many social interactions are now happening online > easy to track and record

⬛ In collaboration with Ecker (iiWAS 2015) data from Internet Relay Chat (IRC) was logged, annotated and imported to DynamoGraph

Page 15: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

full time-span

Page 16: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

selected time-span

1st June 2008 15th June 2008

Page 17: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

REAL WORLD SOCIAL NETWORKS

⬛ Data is available as online resource ⬛ Users can filter and load data into DynamoGraph ⬛ Algorithms for automatic layout of the visualisation are available

(ForceAtlas2)

⬛ In visualisation it is already clear that reachability measures computed on the full time-span will often not hold on shorter time-spans ⬛ How is information dissemination influenced by that fact? ⬛ Can we perform cluster analysis on users and identify topics of

interest?

Page 18: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

THE GLOBAL SOCIAL LEARNING NETWORK

Page 19: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

THE GLOBAL SOCIAL LEARNING NETWORK

⬛ Experience API (xAPI) used to store learning logs independent of a certain Learning Management System (LMS)

⬛ Scrapers available that convert data from other systems

⬛ An xAPI proxy service is available that can ingest xAPI statements into DynamoGraph

⬛ Setting up experiments with real students

Scrapers

LMS

xAPI Proxy

xAPI Repository

DynamoGraph

Page 20: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

CONCLUSIONS AND FUTURE WORK

⬛ DynamoGraph as a platform for large-scale temporal graph processing has matured enough to be evaluated in scientific scenarios

⬛ Three scenarios were discussed that motivate the need for temporal graph analytics and layout the paths for our future work

⬛More temporal graph algorithms are to be implemented (reachability, clustering, …) to provide more interesting metrics to our users

Page 21: USING DYNAMOGRAPH: APPLICATION SCENARIOS FOR …...⬛ 53.5 billion HTTP headers logged from Sept. 2006 to May 2010 ... ⬛ In collaboration with Ecker (iiWAS 2015) data from Internet

Matthias Steinbauer [email protected]

slides available at http://steinbauer.org/


Recommended