G*: A Parallel System for Efﬁciently Managing Large Graphs ... · This paper presents a new...

G*: A Parallel System for Efficiently Managing LargeGraphs in the Cloud∗

Jeong-Hyon Hwang, Jeremy Birnbaum,Alan Labouseur, Paul W. Olsen Jr.,

Sean R. Spillane, Jayadevan VijayanDepartment of Computer Science

State University of New York - Albany, USA{jhh, jbirn, alan, polsen, seans, appu}@cs.albany.edu

Wook-Shin HanDepartment of Computer EngineeringKyungpook National University, Korea

[email protected]

ABSTRACTMany of today’s data analytics applications require process-ing a series of large graphs that represent an evolving net-work. This paper presents a new parallel system that effi-ciently supports these applications in the cloud. This system,G*, stores large graphs on servers in a scalable fashion whilecompressing the graphs based on their commonalities. Un-like traditional database and graph processing systems, G*can efficiently execute complex queries on large graphs us-ing operators that process graph data in parallel. G* speedsup queries on multiple graphs by processing commonalitiesamong graphs once and then sharing the result across rele-vant graphs. G* provides a set of processing primitives thatabstract away the complexity of distributed computation andenable easy and succinct implementation of operators. Thispaper presents evaluation results that substantiate the uniquebenefits of G* over traditional database and graph process-ing systems.

Categories and Subject DescriptorsH.2.4 [Database Management]: Systems—ParallelDatabases

General TermsDesign, Experimentation, Performance

1. INTRODUCTIONAdvancements in technology enable access to huge

amounts of information about various complex networksincluding social networks [25, 29], transportation net-works [25], communication networks [25, 30], citation net-works [25] and the World Wide Web. These networks can beexpressed as graphs in which vertices represent entities andedges represent relationships between entities. Most real-world networks change over time, so there is significant in-terest in understanding their evolution by extracting certainfeatures from a series of graphs that represent a network atdifferent points in time. These features include the distribu-tion of vertex degrees and clustering coefficients [14], net-∗This work is supported by NSF CAREER award IIS-1149372.

work density [18], triadic closure [17], the size of each con-nected component [14, 16], the shortest distance betweenpairs of vertices [18, 24], and the centrality or eccentricity ofvertices [24]. Trends discovered as above can play a crucialrole in sociopolitical science, national security, marketing,transportation, communication network management, epi-demiology and pharmacology.

Managing collections of graphs that represent billions ofentities and hundreds of billions of connections between en-tities [7] raises new challenges. First, a cluster of serverssuch as a public/private cloud must be effectively used tostore and process the massive amounts of graph data. Sec-ond, graphs that represent a dynamically changing network(e.g., cumulative snapshots of a friendship network) may besubstantially similar to each other. It is therefore crucialto take advantage of the commonalities among graphs andavoid redundant storage and processing of graph data. Third,finding trends in the evolution of a network requires a combi-nation of graph processing (e.g., finding the shortest distancebetween vertices in each graph), aggregation (e.g., comput-ing, for each pair of vertices, the variance of the shortestdistance across graphs over time), filtering, and other op-erations. For this reason, we require a framework that canconveniently and efficiently run complex analytic queries oncollections of large graphs.

Existing systems do not effectively address the abovechallenges. For example, relational database systems re-quire breaking down graph structures into edges recordedin a relation [31]. Therefore, graph analysis, using rela-tional databases, involves costly join operations [10, 31].On the other hand, current graph processing systems suchas Google’s Pregel [19], Microsoft’s Trinity [28], the opensource Neo4j [21], and others [5, 6, 8, 13] can perform onlyone operation on one graph at a time. Thus, these systemscannot readily support complex queries on multiple graphs.Furthermore, neither relational database systems nor pre-vious graph processing systems can automatically take ad-vantage of the commonalities among graphs in the storageand processing of data. These limitations are experimentallydemonstrated in Section 5 of this paper.

We present a new parallel system, G*, that convenientlyand efficiently manages collections of large graphs. As Fig-

1

! "#

G1

a

b

c

d

b d c d

Time 1

disk

b

{G1}mem

ca

disk

mem

{G1}

b1

b

a1

a

disk

mem

{G1}

c1

c

d1

d

(a) Creation of graph G1.Each vertex and its edgesare assigned to a server.

d

b1

{G1, G2}

b

a

b

c e

d

G2-G1

disk

b

a1

{G1, G2}memory

ca

disk

memory

disk

e

c1

{G1, G2}{G1} {G2}

c2 e1 d1

c de

memory

ba c c e d

! "#

Time 2

G2

(b) Addition of vertex e and edge (c, e) in graph G2.Server γ stores two versions of c (c1 in G1 and c2 withan edge to e in G2) in a deduplicated manner.

disk

b

a1

{G1, G2, G3}memory

ca

a

b

c e

d f G3-G2

......

disk

e f

c1

{G1, G2}{G1} {G2, G3} {G3}

c2 e1 d1 d2 f1

c de f

memory

a c c e d d f

! "

Time 3

G3

(c) Addition of f and (d, f) in graph G3.Server γ stores two versions of d (d1 in bothG1 and G2, and d2 with an edge to f in G3).

Figure 1: Overview of G*. Each server efficiently manages a subset of vertices and edges from multiple graphs.

ure 1(a) shows, G* enables scalable and distributed storageof graph data on multiple servers. In G*, each server is as-signed a subset of vertices and their outgoing edges frommultiple graphs. Each G* server strives to efficiently man-age its data by taking advantage of the commonalities amongthe graphs. For example, server α in Figure 1 is assignedvertex a and its outgoing edges which remain the same ingraphs G1, G2 and G3. Thus, server α stores vertex a andits edges only once on disk. On the other hand, vertex cobtains a new edge to e in graph G2 (Figure 1(b)). In re-sponse to this update, server γ stores c2, a new version of c,which shares commonalities with the previous version, c1,for space efficiency.

To quickly access disk-resident data about a vertex and itsedges, each G* server maintains an index that contains <ver-tex ID, disk location> pairs. This index also takes advantageof the commonalities among graphs to reduce its footprint.Specifically speaking, this index stores only one <vertex ID,disk location> pair for each vertex version in a collection forthe combination of graphs that contain that version. In Fig-ure 1(c), vertex version c2 on server γ represents vertex c andits outgoing edges which remain the same in graphs G2 andG3. For c2, γ’s index stores (c, location(c2)) only once ina collection for the combination of G2 and G3 rather thanredundantly storing it for each of G2 and G3. This graph in-dex efficiently stores vertex IDs and disk locations whereasall of the attribute values of vertices and edges are saved ondisk. Due to its small size, the index can be kept fully ormostly in memory, enabling fast lookups and updates (Sec-tion 3.3). To prevent the graph index from managing toomany combinations of graphs, each G* server also automat-ically groups graphs and separately indexes each group ofgraphs (Section 3.4).

Just like traditional database systems, G* supports sophis-ticated queries using a dataflow approach where operatorsprocess data in parallel. In contrast to other database sys-tems, G* provides simple yet powerful processing primi-tives for solving graph problems. These primitives effec-tively hide the complexity of distributed data management

and permit easy and succinct implementation of graph oper-ators. Furthermore, G* operators process a vertex and itsedges once and then associate the result with all relevantgraphs. This type of sharing accelerates queries on multi-ple graphs.

To the best of our knowledge, G* is the first system thatprovides all of the above features, in contrast with previousdatabase and graph processing systems. Our contributionsare as follows: We

1. provide an architectural design of a parallel system thatconveniently and quickly runs complex queries on col-lections of large graphs.

2. develop techniques for efficiently storing and indexinglarge graphs on many servers.

3. present a new parallel dataflow framework that can ac-celerate queries on multiple graphs by sharing compu-tations across graphs.

4. demonstrate the benefits of the above features with ex-perimental results.

The rest of this paper is organized as follows: Section 2provides an overview of the G* system. Sections 3 and 4present G*’s storage and query processing frameworks, re-spectively. Section 5 shows evaluation results and Section 6summarizes related work. Section 7 concludes this paperand discusses future research plans.

2. BACKGROUND

2.1 System Architecture of G*As Figure 2 shows, G* is a distributed system that con-

sists of multiple servers. A server that manages the wholesystem is called the master. A query submitted to the mas-ter is first transformed by the query parser into a networkof operators, which is then converted by the query optimizerinto an optimized query execution plan (Figure 3). Based onthe execution plan, the query coordinator instantiates andexecutes operators on other servers (Figure 4) by control-ling their query execution engines. The graph manager oneach server stores and retrieves graph data using the server’s

2

QueryParser

QueryOptimizer

QueryCoordinator

querynetwork

executionplan

G* master

query

Communication Layer

Graph ManagerMemory Buffer

Disk

Index

Query Execution Engine

HA

Communication Layer

network

......Graph Manager

Memory Buffer

Disk

Index

Query Execution Engine

HA

Communication Layer

α β

control

data controldata control

Figure 2: G* Architecture

memory and disk. The communication layer enables reli-able communication with remote servers. Finally, the highavailability module performs tasks for masking server andnetwork failures, which is not further discussed in this pa-per. The above components are currently implemented inapproximately 30,000 lines of Java code.

2.2 Data ModelG* manages three types of entities: graphs, vertices and

edges. If G* were to adopt the relational data model, usinga separate relation for each entity type, then graph querieswould involve join operations whenever they require datafrom multiple relations.

To avoid the above complication, G* uses a nested datamodel that captures the relationships between graphs, ver-tices, and edges. The absolute path to each graph onG*’s distributed file system is used as the ID for thatgraph. For graphs, G* uses a logical schema graph(id,att1, att2, · · · , {vertex}). The primary key is un-derlined, each atti is a graph attribute, and {vertex}is a multi-valued attribute that represents the set of ver-tices contained in the graph. For vertices and edges,G* has schemas vertex(graph.id, id, att1, · · · ,{edge}) and edge(graph.id, vertex.id, des_id,att1, · · · ), respectively. The primary key is underlinedand {edge} is a multi-valued attribute that represents the setof edges emanating from a vertex.

2.3 Query LanguagesG* currently supports two query languages. The first lan-

guage, called PGQL (Procedural Graph Query Language),can directly define a network of operators that the master canconstruct on G* servers. Figure 3 shows an example whichcomputes the average vertex degree for each graph located inthe ’/twitter/’ directory on G*’s distributed file system.1

1Each line in Figure 3 specifies (a) the type of operator to create(e.g., VertexOperator on line 1) as well as arguments includ-ing the operators to connect to in order to obtain input data (e.g.,vertex@local on line 2 refers to an operator labeled vertex onthe same server), (b) the servers to create the operator (e.g., @* onlines 1-3 and @alpha on lines 4-5 indicate operator creation on allservers and on server alpha, respectively), and (c) the label as-signed to the operator to create (e.g., vertex on line 1).

1 vertex@* = VertexOperator([], '/twitter/*');2 degree@* = DegreeOperator([vertex@local]);3 count_sum@* = PartialAggregateOperator([degree@local]4 count_sum, degree, graph.id);5 union@alpha = UnionOperator([count_sum@*]);6 avg@alpha = AggregateOperator([union@local], avg, graph.id);

Figure 3: Average Degree Query Plan

(1, 2, {G1, G2, G3}), (1, 1, {G1, G2, G3}), (2, 0, {G1}), (3, 1, {G2} (4, 2, {G3})

(1, 1, {G1,G2,G3})

(a, 2, {G1,G2,G3}) (b, 1, {G1,G2,G3})

(a, ..., {G1,G2,G3}) (b, ..., {G1,G2,G3})

(1, 2, {G1,G2,G3}) (2, 0, {G1}), (3, 1, {G2}), (4, 2, {G3}))

vertex

degree

count, sum

average

union

degree

count_sum

avg

union

degree

count_sum

degree

count_sum

(3/4, {G1}), (4/5, {G2}), (5/6, {G3})

{G1,G2,G3} {G1,G2,G3} {G1} {G3}{G2,G3}{G1,G2}

a c

bb d

c e

d fc

d

! "#

(c, 0, {G1}), (d, 0, {G1, G2}), (c, 1, {G2, G3}), ...

(c, ..., {G1}), (d, ..., {G1, G2}), (c, ...,{G2, G3}), ...vertex vertexvertex

Figure 4: Computation of Average Degree

Figure 4 illustrates a network of operators constructed ac-cording to the example in Figure 3. Figure 4 assumes thatgraphs G1, G2, and G3 are located in the ’/twitter/’ di-rectory, and further that each server has grouped its verticesand edges based on the graphs that have them in common.The vertex and degree operators on each server computethe degree of each vertex while associating the result withthe IDs of the related graphs. For example, the output ofdegree on server α indicates that the degree of a is 2 ingraphs G1, G2, and G3. The count_sum operator on eachserver then computes the count and sum of the received ver-tex degrees with grouping on graph.id (see also lines 3-4in Figure 3). These partial aggregate values computed oneach server are merged by the union operator on server αand then processed by the avg operator, which computes thefinal result. Further details of query processing, includingdevelopment of graph processing operators and shared com-putation across graphs, are discussed in Section 4.

Conceptually, G* can support any query language that canbe translated into PGQL. The second language that G* pro-vides is called DGQL (Declarative Graph Query Language).DGQL is similar to SQL, but closer to OQL [2] in that itenables queries defined upon sets of complex objects (e.g.,vertices referencing edges to other vertices). Figure 5 showsexample queries that are based on the representative applica-tions mentioned in Section 1. These queries are also used inSection 5 to measure the performance of G*. These queriescompute, for each graph of interest, the average vertex de-gree (Q1), as well as the distribution of clustering coeffi-cients (Q2), the shortest distances to vertices from vertex ’1’(Q3), and the sizes of connected components (Q4).

In Figure 5, degree() on line 2 and c_coeff() on line 7compute the degree and clustering coefficient of each vertex,

3

1 select graph.id, avg(degree) -- Q1. average vertex degree2 from (select graph.id, degree(vertex) as degree3 from graph('/twitter/*').vertex)4 group by graph.id56 select graph.id, coeff10*0.1, count(*) -- Q2. c. coeff. dist.7 from (select graph.id, floor(c_coeff(vertex)*10) as coeff108 from graph('/twitter/*').vertex)9 group by graph.id, coeff10

1011 select graph.id, min_dist, count(*) -- Q3. min distance distr.12 from min_dist(graph('/tree/*'), '1') -- vertices with min_dist13 group by graph.id, min_dist1415 select graph.id, comp_size, count(*) -- Q4. compo. size distr.16 from (select graph.id, comp_id, count(*) as comp_size17 from comp_id(graph('/twitter/*'),18 'edge.message like %Wall Street%')19 -- vertices with comp_id (found using specified edges)20 group by graph.id, comp_id)21 group by graph.id, comp_size

Figure 5: Example Multi-Graph Queries

respectively. min_dist() on line 12 computes the shortestdistance from vertex ’1’ to each vertex for every graph inthe ’/tree/’ directory. min_dist() outputs objects thatcontain the ID of a vertex v and the min_dist value (i.e.,the shortest distance from vertex ’1’ to v). comp_id() online 17 finds the connected components for each graph inthe ’/twitter/’ directory, using edges of interest (thosepossibly related to “Wall Street”). comp_id() assigns thesame component ID to all of the vertices that are within thesame component. comp_id() outputs objects which con-tain the ID of a vertex and the comp_id value (the ID of thecomponent that contains the vertex).

3. EFFICIENT STORAGE OF GRAPHSA key requirement in the design and implementation of

G* is to succinctly store large graphs by taking advantageof their commonalities. Another important requirement isto effectively utilize the relatively large storage capacity ofdisks and the high speed of memory. It is crucial to mini-mize the number of disk accesses in both data storage andretrieval. For example, if each graph edge is accessed witha 10ms disk seek time, it would take 116 days to access 1billion edges. This section presents a solution that meets theabove requirements.

3.1 Overview of Graph Data StorageG* manages directed graphs using multiple servers. G*

deals with undirected graphs using directed graphs that con-tain, for each undirected edge, two corresponding directededges, one in each direction. G* can receive data from ex-ternal sources such as Twitter’s Gardenhose [29] or it canimport data files. According to the input data, G* can add,delete, or update vertices and edges in a graph and do thesame to the attributes of vertices and edges. G* can cre-ate a series of cumulative graphs by periodically cloning thecurrent graph (Section 3.3.3) and then updating only the newgraph according to the new data.2 G* assigns a vertex and its2In this paper, we focus on managing graphs that correspond to

Disk

# entries: 4 free space c1d1e1c2

size of entry[0]

size of entry[1]

size of entry[2]

size of entry[3]

disk block 10

vertex ID name email

c John [email protected]_ID weight {message}

e 1 msg1, msg2, ...

edges:

vertex attributes:

edges: null

vertex attributes: !

c2@disk location 10:3 c1@disk location 10:0

Figure 6: Organization of a Disk Block. Objects areallocated from the end of the block while information aboutthese objects is stored from the beginning of the block.

outgoing edges to the same server for high data locality (inFigure 1, server α can access every edge of a without con-tacting others).3 An update of an edge is therefore handledby the server that stores the source vertex of that edge.

As shown in Figure 1, if a vertex’s attributes or outgo-ing edges change in a graph, the corresponding server savesa new version of the vertex on disk. If a vertex and itsedges are updated multiple times in a graph, only the mostrecent version is kept. Section 3.2 discusses the details ofefficiently storing these versions. To quickly access disk-resident data, each G* server uses a buffer pool that keeps amemory cache of disk blocks. Furthermore, each G* servermaintains an index to quickly find the disk location of a ver-tex and its edges, given relevant vertex and graph IDs. Thisindex stores, for each vertex version, only one <vertex ID,disk location> pair in the collection for the combination ofgraphs that contain that vertex version. We call this collec-tion a VL map (Vertex Location map) since it associates avertex ID with a disk location. We call a <vertex ID, disk lo-cation> pair a VL pair. In Figure 1(c), information aboutvertex version c2 is stored only once in the VL map for{G2, G3}. We call the above graph index the CGI (CompactGraph Index) because of its space efficiency. In Figure 1(c),server γ’s CGI would have four VL maps: one for each of{G1}, {G2, G3}, {G1, G2} and {G3}.

3.2 Disk Storage of Graph DataIn a variety of graph applications, the edges of a vertex

must be processed together (Section 4). To minimize thenumber of disk accesses, G* stores each vertex and its edgeswithin the same logical disk block. All of the data withina disk block is loaded and saved together and the logicaldisk block size is configurable (the default size is 256KBin the current implementation). For each vertex, the vertexID, attribute values, and all of the outgoing edges are storedon disk (Figure 6). For each edge, the ID of the destination

periodic snapshots of an evolving network. We plan to extend thecurrent work with input logging so that it can reconstruct graphs asof any points in the past by using periodic snapshots and log data.This future work is not further discussed in this paper.3The current G* implementation assigns each vertex to a serverbased on the hash value of the vertex ID. We are developing datadistribution techniques that can reduce the edges whose end pointsare assigned to different servers.

4

symbol description

N number of graphs (graphs: G1, G2, · · · , GN )N(v) number of graphs in {Gi}Ni=1 that contain vertex vN(v, Q) number of graphs, among those in Q, that contain vV set of all distinct vertex IDs in {Gi}Ni=1 (or size of the set)M(v) number of VL maps that contain vertex v (i.e., number of

versions of v)M(v, Q) number of VL maps that contain v and are related to a graph

in Q (i.e., number of distinct versions of v in graphs Q)M(Gi) number of VL maps related to graph Gi

s(Gi) the size of the ID of graph Gi

p(v) the size of VL pair for vertex v

Table 1: Symbols for Cost Analysis

vertex and the attribute values of the edge are saved on disk.In Figure 6, two versions c1 and c2 of vertex c are storedwithin disk block 10 at indices 0 and 3, respectively. Forspace efficiency, c1 and c2 share commonalities. This typeof deduplicated storage of complex objects is supported byJava serialization. The above disk locations are representedas “10:0” and “10:3”, respectively.

3.3 Compact Graph IndexAs Figure 7 shows, the CGI maintains VL pairs in a dedu-

plicated fashion by using VL maps. In the current CGI im-plementation, VL maps use B+ trees. The size of each VLpair (e.g., 16 bytes for the ID and disk location of a vertex) isin general much smaller than that of the disk resident graphdata (e.g., 10 kilobytes of data storing all of the attribute val-ues of a vertex and its edges). Each G* server therefore canusually maintain the entirety or at least a large fraction of itsCGI in memory, thus achieving fast data lookup.

The CGI needs to maintain multiple VL maps, one fora combination of stored graphs. To iterate over all of thevertices in each graph, the CGI has a root map that asso-ciates each graph ID with all of the relevant VL maps (seethe shaded triangle in Figure 7(d) that associatesG1 with VLmaps for {G1}, {G2}, and {G1, G2}).

While the CGI has benefits in terms of storage, the updateoverhead of the CGI increases with more VL maps (Sec-tions 3.3.1 and 3.3.3). As experimentally demonstrated inSection 5.2, the number of VL maps managed by the CGIusually does not increase exponentially with the number ofgraphs. In particular, given a series of cumulative graphs, thenumber of VL maps increases at most quadratically. Thereason behind this phenomenon is that in graphs {Gi}Ni=1,each vertex version is created in some graphGα and remainsthe same in the subsequent graphs until it is superseded bya new version in graph Gω . This means that only the graphcombinations in the form of {Gi}ω−1i=α may have commonvertices and edges. Section 3.4 presents a technique forcontroling the overhead of the CGI. This technique groupsgraphs and constructs a separate CGI for each group.

3.3.1 Space Cost AnalysisTable 1 shows the symbols used in the cost analysis of

CGI PGI

space∑

v∈V M(v)p(v) +∑Ni=1 M(Gi)s(Gi)

∑v∈V N(v)p(v)+∑Ni=1 s(Gi)

put(v, d, g) O(N + log V ) O(log V )

clone(g’, g) O(NM(g)) O(V )

getLocations(v,Q)

index O(M(v)N +

M(v, Q) log V )

O(|Q| log V )

data O(M(v, Q)) O(N(v, Q))

Table 2: Cost of Graph Indexing Approaches

the CGI. The analysis results are summarized in Table 2and experimentally verified in Section 5. Given vertex vand each version vi of v, the CGI contains only one VLpair in the VL map for the graphs that contain vi. There-fore, the total amount of space for storing all of the VLpairs can be expressed as

∑v∈V M(v)p(v) where V , M(v)

and p(v) are as defined in Table 1. The space overhead ex-pression for the CGI in Table 2 includes an additional term,∑Ni=1M(Gi)s(Gi), to account for the space used for the

graph IDs contained in the CGI.On the other hand, consider a naive index structure that

maintains a separate VL map for each graph, which is calledthe PGI (Per-Graph Index) in Table 2. In the PGI, the spacerequired for storing all of the VL pairs is

∑v∈V N(v)p(v)

since the PGI contains one VL pair for every graph that con-tains v. In this case, the amount of space required for storinggraph IDs is

∑Ni=1 s(Gi).

The above analysis indicates that the CGI becomes morecompact than the PGI as each vertex has fewer versions andjoins more graphs. For example, if there are 100 graphs and5 distinct versions of vertex v in these graphs, the CGI willcontain only 5 VL pairs for v whereas the PGI will contain100 pairs for v. In unlikely situations where every vertexand its edges are updated in each graph (e.g., every Twitteruser sends a message to a new user every hour), the CGI con-verges to the PGI. The reason is that there is no commonalitybetween graphs and therefore only one VL map is requiredfor each graph.

3.3.2 Creation and Update of Compact Graph IndexA CGI needs to be efficient in its space usage. For this

reason, it starts with an empty root map. When the first graphG1 is registered into the CGI, the root map adds an entrythat contains the ID of G1. The vertices and their edges inG1 are then saved to disk as explained in Section 3.2 whileonly the vertex IDs and disk locations are inserted into theCGI. Since a large faction or the entirety of the CGI can bekept in memory, the cost of updating the CGI is negligiblecompared to the overhead of storing the graph data to disk.

The put(VID v, DLOC d, GID g) method in Figure 8relates the specified vertex ID (VID) v, disk location (DLOC)d of the vertex data, and graph ID (GID) g to each other.The operation of this method is illustrated in Figure 7. Themethod first determines whether or not vertex v is alreadycontained in an existing VL map (lines 2-3). If there is nosuch VL map (line 4), a VL map which is related only to

5

G1

c d

c1 d1

{G1}

(a) indexfor G1

G1

c d

c1 d1

G2

{G1,G2}

(b) G2 as aclone of G1

G2

c d

{G1,G2}

c1 d1

e

{G2}

e1

G1

(c) additionof e in G2

G2

c d

{G1,G2}

c1 d1

c

{G2}

c2

G1

e

e1

{G1}

(d) update of vertexc in G2

Figure 7: CGI Update Examples. Figures 7(a) and 7(d)show server γ’s CGI in Figures 1(a) and 1(b), respectively.

1 public void put(VID v, DLOC d, GID g) {2 VLMap m1 = getVLMap(v, d); // VL map having (v, d)3 VLMap m2 = getVLMap(v, g); // VL map related with g, having v4 if (m1 == null && m2 == null) // if no VL map contains v5 getVLMap(g).put(v, d); // put (v, d) in VL map only for g6 if (m1 != null && !m1.related(g)) { // m1 not related with g7 m1.removeVertex(v);8 common(m1, g).put(v, d);9 }

10 if (m2 != null) { // if m2 related with g, contains v11 DLOC prevD = m2.location(v); // previous disk loc. of v12 if (!d.equals(prevD)) { // current disk loc. of v differs13 m2.removeVertex(v);14 common(m1, g).put(v, d);15 if (!m2.isOnlyRelated(g)) // m2 not related only with g16 diff(m2, g).put(v, prevD);17 }18 }19 }

Figure 8: CGI Update Method

the target graph g is found by using getVLMap(g) (line 5).getVLMap(g) creates a new VL map if none existed before.Then, (v, d) is stored in this VL map (line 5). For example,in Figure 7(b), information about vertex e is not containedin any VL map. Therefore, when vertex e is added in graphG2, the VL map for {G2} stores e’s ID and disk location(Figure 7(c)).

On the other hand, if the (v, d) pair is contained in aprevious VL map m1 (lines 2 and 6 in Figure 8), the methoddetermines whether or not m1 is related to the target graphg, using m1.related(g) on line 6. If so (i.e., (v, d) isalready contained in the VL map related to g), no furtheraction is needed. Otherwise, (v, d) must be moved fromm1 (line 7) to a VL map that is related to (1) graph g and(2) all of the graphs related to m1. This map is found usingcommon(m1, g) on line 8.

Suppose that a vertex v in graph g is updated. In this case,the new version of the vertex is stored at a different disk lo-cation d rather than the previous disk location prevD whichpreserves v’s previous version. Next, a VL map m2 which isrelated to graph g and contains information about v is found(lines 3 and 10). For example, in Figure 7(c), if c is updatedin graph G2, the VL map for {G1, G2} corresponds to m2since it contains information about c. Then, (v, prevD)is removed from m2 (line 13) and stored in the VL map re-lated to all of the graphs that contain (v, prevD), but notgraph g. This VL map to store (v, prevD) can be foundby diff(m2, g) on line 16. For this reason, in Figures 7(c)and 7(d), information about c1 (i.e., the previous version of

c) is moved from the VL map for {G1, G2} to the VL mapfor {G1}. Furthermore, (v, d) needs to be stored in the VLmap related to (1) graph g and (2) all of the other graphs thatcontain the version of v at disk location d. This VL map isfound using common(m1, g) on line 14. In Figures 7(c) and7(d), information about the new version, c2, of c is stored inthe VL map for {G2}.Cost Analysis. The put(v, d, g) method runs in O(N +log V ) time. This is because all of the VL maps that containvertex v (lines 2-3 in Figure 8) can be found in O(M(v))time using an inverted list for v (maps(v)) that points to theVL maps that contain v. Given a list of graph IDs, the VLmap related to all of the corresponding graphs (lines 5, 8,14, 16) can be found in O(N) time using a hash-map thatassociates a sorted list of graph IDs with a relevant VL map.Other operations that insert (or remove) a VL pair into (orfrom) a VL map (lines 5, 7, 13, 14, 16) and that find the disklocation of v (line 11) can be completed in O(log V ) timesince a B+ tree is used for each VL map. Also, M(v) ≤ N .Therefore, all of the above operations can be completed inO(N + log V ) time. In contrast to the CGI, the PGI cancomplete put(v, d, g) in O(log V ) time by inserting the(v, d) pair into the VL map for g, which indicates that theCGI has a relatively higher update cost as more graphs areindexed together. However, this extra overhead is in practicenegligible when compared to the cost of writing data to disk.Furthermore, Section 3.4 presents a solution that can tradespace for a faster update speed.

3.3.3 Graph CloningWe can obtain a series of cumulative graphs by iteratively

cloning the last graph and then adding vertices and edges tothe new graph. As Figures 7(a) and 7(b) show, the CGI canquickly create a new clone g’ of graph g by updating onlythe root map as well as the VL maps related to g so that theseVL maps are also related to g’.Cost Analysis. The clone operation can be completed inO(NM(g)) time since M(g) VL maps are related to g andeach of these VL maps needs to be related to g’. Associ-ating a VL map with g’ takes O(N) time since it requiresupdating the hash-map that associates a sorted list of graphIDs with relevant VL maps (explained in the cost analysisof the put(v, d, g) method in Section 3.3.2). In contrastto the CGI, the PGI takes a substantially longer O(V ) timesince it must replicate the whole VL map for g.

3.3.4 Graph Data RetrievalTo efficiently process queries on multiple graphs, the

CGI supports the getLocations(v, Q) method in Fig-ure 9. Given a vertex ID (VID) v and a set of graphIDs (Set<GID>) Q, the method returns a collection of pairsrepresenting (a) a disk location that stores a version of vin Q and (b) the IDs of all of the graphs which containthat version. In Figure 7(d), if the disk locations in Fig-ure 6 are assumed, getLocations(d, {G1, G2}) would re-

6

1 Set<Pair<DLOC, Set<GID>>> getLocations(VID v, Set<GID> Q) {2 Set<Pair<DLOC, Set<GID>>> locations = new HashSet();3 for (VLMap m : maps(v)) { // for each VLMap having v4 Set<GID> R = m.relatedGraphs(Q);5 // graphs among those in Q that are related with m6 if (R.size() > 0)7 locations.add(new Pair(m.location(v), R));8 }9 return locations;

10 }

Figure 9: CGI Lookup Method

turn {(10:1, {G1, G2})} and getLocations(c, {G1, G2})would return {(10:0, {G1}), (10:3, {G2})}, respectively.This getLocations(v, Q) method allows G* operatorsto process each vertex version once and then use the re-sult across all of the graphs which contain that vertex ver-sion (Section 4.2). This capability can substantially speedup queries on multiple graphs (Section 5.1).Cost Analysis. The getLocations(v, Q) method iteratesover the VL maps which contain information about vertexv, using the inverted list for v (line 3 in Figure 9). This in-verted list, maps(v), which contains M(v) VM maps is ex-plained in the cost analysis of the put(v, d, g) method.The getLocations(v, Q) method then finds, for each VLmap m in maps(v), the set of graphs R that are containedin Q and related to m (line 4). Finding R takes O(N) timesince it requires finding the intersection of two sorted listsof graph IDs. This set R of graphs, if exists, is added to theresult set in conjunction with the disk location that storesthe state of vertex v when v belongs to the graphs in R(lines 6-7). The disk location of v can be found in O(log V )time using the B+ tree for VL map m. Therefore, the over-all time complexity of the getLocations(v, Q) methodis O(M(v)N +M(v, Q) log V ) where M(v, Q) denotes thenumber of VL maps that contain information about vertexv and that are related to a graph in Q (i.e., the number ofdistinct versions of v in graphs Q). Reading the M(v, Q) ver-sions of v from disk via the buffer pool may require up toM(v, Q) disk accesses (Table 2) since at most one disk ac-cess is needed to read each version of v. On the other hand,accessing a vertex once for each graph using the PGI wouldcause up toG(v, Q) disk accesses, whereG(v, Q) denotes thenumber of graphs, among those in Q, that contain v.Example. Suppose that there are 100 graphs and 5 versionsof vertex v in these graphs and that the CGI is kept in mem-ory due to its small size. In this case, a query on these graphsrequires up to 5 disk accesses for vertex v, one for each ver-sion of v. If the PGI is used, the same query would requireup to 100 disk accesses for vertex v, one for each graph. Inthis case, the CGI would be much faster than the PGI.

3.4 Compact Graph Index SplittingThe CGI has low space overhead, enables sharing of com-

putations across graphs, and may substantially reduce thedata retrieval overhead. However, as more graphs are addedto the CGI, the number of VL maps may grow superlinearly,thereby noticeably increasing the lookup and update over-

G1 G2 G3 G4 G5 G6 G7 G8 G9

G1 G2 G3 G4 G5 G6 G7 G8 G9

CGI for {G1, G2, G3}



Figure 10: CGI Splitting Example

number of graphs

500

600

700

800

900

1000

1100

10 20 30 40 50 60

average lookup delaymedian average lookup delay

nanoseconds

minDelaythreshold

split

Figure 11: Determination of Split Point

1 boolean needSplit() {2 return (lookupDelays.size() >= windowSize &&3 curDelay() - minDelay > threshold*minDelay);4 }

Figure 12: CGI Splitting

head. Our solution to this problem groups graphs and thenconstructs a separate CGI for each group of graphs in or-der to limit the number of VL maps managed by each CGI.In other words, this approach trades space for speed by ig-noring the commonalities among graphs in different groups.Figure 10 shows an example where one CGI is constructedfor every three graphs. The root collection in the exampleassociates each graph (e.g., G1) with the CGI which coversthat graph (e.g., the CGI for G1, G2 and G3).

One important challenge in implementing the above ap-proach is to effectively determine the number of graphs cov-ered by each CGI. Figure 12 shows our solution to determinewhether or not to split the current CGI into two CGIs: onethat covers the previous graphs and another one to cover thecurrent and succeeding graphs. The code in Figure 12 is in-voked whenever a new graph is added. This method firstdetermines if enough average lookup delays have been en-tered into a list lookupDelays (line 2). To compute theaverage lookup delay whenever a new graph is created, eachCGI maintains two variables that keep track of the sum andcount of lookup delays. If the size of the lookupDelayslist is larger than a threshold windowSize, the method de-termines whether or not there has been a substantial increasein the lookup delay when compared to the minimum delayobserved in the past (line 3).

To capture the general trend in the lookup delay despitemeasurement inaccuracies (see Figure 11), our approachuses a sliding window on the series of average lookup delayswhile selecting the median whenever the window advancesin response to the addition of a new graph. curDelay on line3 refers to the median delay selected from the current win-dow (see the dotted line in Figure 11) and minDelay refersto the minimum among all of the previous median delays. Inthe current implementation, threshold is set to 0.1 to de-tect any 10% increase in the lookup delay and windowSize

7

is set to 10 by default. Experimental results in Section 5.2will show that our CGI splitting approach makes reliable de-cisions while keeping the lookup delay near the minimum.

4. GRAPH QUERY PROCESSINGTraditional database systems transform queries into a net-

work of operators that process data in a pipelined fashion.A central challenge in applying this approach to G* is todevelop a new query processing framework that meets thefollowing requirements:

1. The query processing framework must be able to exe-cute operators on multiple servers to efficiently processdistributed graphs.

2. G*’s indexing mechanism associates each vertex withall of the graphs that contain that vertex (Section 3.3).The query processing framework needs to use this fea-ture to share computations across relevant graphs.

3. G* requires operators for solving graph problems. Theframework must permit easy and succinct implemen-tation of these operators.

Sections 4.1, 4.2 and 4.3 describe how we address theabove issues.

4.1 G* Query Processing FrameworkGiven a query execution plan (Figure 3), the G* mas-

ter constructs a network of operators (Figure 4) accord-ing to that plan. Each G* server including the mas-ter interacts with others using remote method invocations(RMIs) [11]. For example, given command “vertex@*= VertexOperator([], ’/twitter/*’)” (line 1 in Fig-ure 3), the master invokes the createOperator() methodon each G* server while passing (1) the label to as-sign to the operator (vertex), (2) type of the operator(VertexOperator), (3) operators to connect to get inputdata ([], meaning none in this case), and (4) arguments (thepattern that expresses the graphs to process, ’/twitter/*’in this case).

A G* operator, such as degree in Figure 4 can obtain datafrom another operator using an iterator received from theoperator (specifically, by repeatedly calling next() on theiterator). Since the native Java RMI [11] does not directlysupport methods that return an iterator, we constructed ourown RMI framework to overcome this limitation. Using ourRMI service, the union operator on server α in Figure 4 canobtain an iterator for obtaining data from the remote opera-tor count_sum on server β. In this case, the union operatoris given a proxy iterator constructed on server α on behalf ofthe original iterator that the count_sum operator provideson server β. To help the union operator on server α effi-ciently process data, server β proactively retrieves data us-ing the original iterator from count_sum and sends the datato server α. This enables pipelined processing.

The current G* implementation supports the graph pro-cessing operators discussed below in addition to other op-erators that are analogous to traditional relational operators,

1 public Vertex next() {2 Vertex v = input(0).next();3 v.add("degree", v.degree());4 return v;5 }

Figure 13: Computation of Vertex Degree

such as selection, projection, aggregate and join. These op-erators may directly read disk-resident graph data (e.g., thevertex operator in Figure 4), receive data streams fromother operators (e.g., degree, count_sum, union, avg op-erators), or exchange special summary values with eachother to solve a graph problem (Section 4.3). Each operatorproduces a stream of objects that represent the result (e.g.,the ID and degree of each vertex in the case of the degreeoperator).

4.2 Sharing of Computations across GraphsEach vertex operator in Figure 4 obtains an iterator from

the set of CGIs (Section 3.4) that cover the graphs beingqueried (e.g., {G1, G2, G3}). Each invocation of next()on this iterator provides the disk location that stores a rele-vant vertex and its edges, as well as the IDs of the graphsthat contain them. Based on this input data, the vertex op-erator produces objects, each of which represents a vertex,its edges, and the IDs of the graphs that contain them (e.g.,(a, . . . , {G1, G2, G3}) on α in Figure 4). In G*, the Vertextype is used for these objects.

G* operators that consume the output stream of thevertex operator can naturally share computations acrossrelevant graphs. For example, the degree operator on serverα in Figure 4 computes the degree of vertex a only onceand then incorporates the result (i.e., degree of 2) into theVertex object that represents a in graphs G1, G2, and G3.Figure 13 shows the actual implementation of the degreeoperator. In this implementation, the complexity of dealingwith multiple graphs is completely hidden.

4.3 Primitives for Graph ProcessingTo facilitate the implementation of graph processing op-

erators, G* provides three types of primitives: summaries,combiners, and BSP (Bulk Synchronous Parallel) operators.

4.3.1 SummariesGraph algorithms typically maintain certain types of val-

ues (e.g., the shortest distance from a chosen vertex) for eachvertex [5, 8, 19]. A summary is a container that keeps aggre-gate data (e.g., count, sum) to support operators for solvinggraph problems. One can implement a custom summary typeby implementing the Summary<V, F> interface, where V isthe type of values for updating the aggregate data, and F isthe type of the value to generate from the aggregate data.Each summary type must implement the following methods:• boolean update(V v): updates the aggregate data

using value v (e.g., sum += v; count++;) and thenreturns true if the aggregate data is changed; falseotherwise.

8

(4) ({a,b,c}, 1) for (v,{G1,G2,G3})

(1) ({a,b,c}, 0) for (v,{G1,G2,G3})

a b c

v

(5) ({a,b,c}, 1+2+1) for (v, {G1,G2,G3}) !

combiner

(3) ({a,b,c}, 2) for (v,{G1,G2,G3})

(2) ({a,b,c}, 1) for (v,{G1,G2,G3})

(6) c_coeff(v): (1+2+1)/(3!2)=2/3

for {G1,G2,G3}

Figure 14: Clustering Coefficient Example

1 public boolean update(Vertex v) {2 boolean updated = false;3 for(Vertex.Edge e: v.edges())4 if (neighbors.contains(e.target())) {5 triangles++;6 updated = true;7 }8 return updated;9 }

1011 public Double value() {12 return 1.0*triangles/neighbors.size()/(neighbors.size()-1);13 }

Figure 15: Implementation of CCoeffSummary

• boolean update(Summary<V, F> s): updates theaggregate data using other summary s (e.g., sum +=s.sum; count += s.count;) and returns true ifthe aggregate data is changed; false otherwise.• F value(): returns a value computed using the cur-

rent aggregate data (e.g., return sum/count;).Figure 15 shows an actual summary implementation that

computes the clustering coefficient of a vertex. Clusteringcoefficients are used to determine if a given graph representsa small-world network [15]. The clustering coefficient of avertex v is defined as the ratio of the number of edges be-tween the neighbors of the vertex (i.e., the number of trian-gles that involve v) to the maximum number of edges thatcould exist between the neighbors of v. For instance, ver-tex v in Figure 14 has three neighbors (a, b, c). Vertex ahas one edge to a neighbor of v (i.e., b). Vertex b has twosuch edges (i.e., b-a and b-c). Vertex c has one such edge(i.e., c-b). Therefore, the clustering coefficient of vertex vis (1 + 2 + 1)/(3 · 2) = 4/6 = 2/3.CCoeffSummary in Figure 15 has two variables (a)

neighbors, a collection that contains the IDs of the neigh-bors of a vertex and (b) triangles, an int type variablefor counting the number of edges between the neighbors.Figure 14 shows an example where the clustering coeffi-cient of vertex v is computed using three CCoeffSummaryinstances whose neighbors and triangles are initial-ized to {a, b, c} and 0, respectively (see step (1)). In step(2), a summary is updated based on vertex a using theupdate(Vertex v) method in Figure 15. In this case,triangles is incremented once due to the edge from a tob, which is another neighbor of v. Two other summariesare updated similarly in steps (3) and (4), respectively.In step (5), these summaries are combined into one us-ing the update(Summary<Vertex, Double> s) method

1 public void init() {2 while(input(0).hasNext()) {3 Vertex v = input(0).next();4 for (VID n : v.neighbors())5 cmbr.update(v.id(), new CCoeffSummary(v.neighbors()), n,

v.graphs());6 }7 }89 public Vertex next() {

10 return cmbr.next();11 }

Figure 16: Computation of Clustering Coefficient

(omitted in Figure 15). In step (6), the clustering coefficientof v is computed using the value() method in Figure 15.

4.3.2 CombinerThe example in Figure 14 updates summaries based on

vertices. It also combines all of the summaries that havethe same target vertex (i.e., the vertex for which the cluster-ing coefficient is computed) into one summary to computethe final value. Our combiner primitive allows operators toperform the above operations while ignoring the low-leveldetails of distributed data management. The combiner pro-vides the following methods:• void update(VID t, Summary<V, F> s,Set<GID> g): associates summary s with ver-tex t and a set of graphs g. If a summary is alreadyassociated with t, that summary is updated using s.• void update(VID t, Summary<V, F> s, VIDi, Set<GID> g): updates summary s based onvertex i and then performs update(t, s, g).• boolean hasNext(): returns true if the combiner

can compute a new final value (e.g., the clustering co-efficient of a vertex) since it has received all of theneeded summaries. This method returns false if nofurther new values can be computed (i.e., the sum-maries for all of the vertices are completely processed).This method blocks if the above is not yet known.• Vertex next(): inserts a new final value into a

proper Vertex object (e.g., inserts the clustering co-efficient of v into an object representing v) and thenreturns the object. This method blocks in situationswhere hasNext() blocks.

Figure 16 shows the actual implementation of an operatorthat computes the clustering coefficient of each vertex. Asthe initial task, the operator processes a stream of Vertexobjects (lines 2-6). For each Vertex object v (line 3), theoperator uses a combiner cmbr to route a summary to eachneighbor n of v as in step (1) in Figure 14 (lines 4-5 in Fig-ure 16). In this case, the underlying G* server sends thatsummary to the server that manages n. When summariesreturn from their trip to a neighbor of v, the G* server com-bines these summaries into one using update(Summary<V,F> s). By calling value() on the resulting summary, theserver computes the clustering coefficient of v and adds itto the Vertex object that represents v. Whenever a Vertex

9

1 void init() {2 cmbr.update(src, new Min<Double>(0.0), graphs(graphPred));3 }45 void compute(Vertex v, Summary<Double, Double> s) {6 for (Edge e : v.edges()) {7 cmbr.update(e.target(), new Min<Double>(s.value() + e.

weight()), v.graphs());8 }9 }

Figure 17: Computation of Shortest Distance

1 protected void init() {2 Iterator<Vertex> i = vertices(graphPred);3 while (i.hasNext()) {4 Vertex v = i.next();5 compute(v, new Min<VID>(v.id()));6 }7 }89 protected void compute(Vertex v, Summary<VID, VID> s) {

10 for (Edge e : v.edges(edgePredicate))11 cmbr.update(e.target(), s, v.graphs());12 }

Figure 18: Computation of Connected Components

object becomes available by the system code as above, thenext() method of cmbr returns the object (lines 9-11 in Fig-ure 16).

4.3.3 BSP OperatorThe BSP (Bulk Synchronous Parallel) model has been

frequently used in various parallel graph algorithm imple-mentations [6, 8, 19]. This model uses a number of itera-tions called supersteps during which a user-defined functionis applied to each vertex in parallel. This custom functionchanges the state variables of a vertex based on the currentvalues of these variables and the messages sent to the ver-tex during the previous superstep. The overall computationcompletes when a certain termination condition is met (e.g.,no state variable changes for any of the vertices).

In G*, operators that support the BSP model can be im-plemented by extending the BSPOperator class and com-pleting the following method:• void compute(Vertex v, Summary<V, F> s):

carries out a certain task based on vertex v andsummary s for v. This method is invoked only whena summary bound to v has arrived for the first time, orthe summary already associated with v is updated withthe summaries received during the previous superstep.

Figure 17 shows an operator implementation that finds, ineach graph that satisfies predicate graphPred, the shortestdistance from a single source vertex to every other vertex.The presented code is equivalent to the Pregel counterpartby Malewicz et al. [19]. In Figure 17, a summary Min<V> isused to keep track of the shortest distance to each vertex.

In Figure 17, the init() method assigns a summary con-taining 0 to the source vertex (src). Since this summaryis newly associated with src, the compute() method is in-voked on src during the next superstep. For each edge eof the current vertex v (line 6), this method computes thedistance to the neighbor vertex (e.target()) and sends the

distance in the form of a Min<Double> summary (line 7),which will later be combined with all of the other summariessent to the same vertex. The compute() method is invokedon a vertex only when a shorter distance to the vertex isfound in the previous superstep. If no summaries in the sys-tem are updated (i.e., no shorter distance is found for any ofthe vertices), the whole BSP process completes. In this case,cmbr.next() returns one Vertex object at a time, whichcontains the shortest distance from the source vertex.

Figure 18 shows another operator implementation thatfinds all of the connected components in each undirectedgraph that matches predicate graphPred. Initially, everyrelevant vertex in the system is assigned a Min<Double>summary which contains the ID of the vertex. In each super-step, the compute() method is invoked only on the verticesto which a smaller ID value is sent. In this way, the smallestvertex ID within a connected component is eventually sent toall of the vertices within the component. This smallest ID isused as the component ID. The rest of this implementation isthe same as that of the shortest distance operator (Figure 17).

4.3.4 DiscussionG*’s processing primitives have the following unique ben-

efits over others’ primitives [5, 8, 19]:1. Summary implementations promote code reuse (e.g.,

the Min<V> summary is used in Figures 17 and 18).2. Summaries allow implementation of graph algorithms

at a high level (e.g., simply use Min<V> rather thanwriting code that finds the minimum among many val-ues). For this reason, summary-based operator imple-mentations are succinct. For example, the compute()method in Figure 17 has only 5 lines of code whereasthe equivalent Pregel code [19] has 12 lines of code.

3. Summaries with the same target vertex are combinedinto one as they arrive at the destination server, in con-trast with other primitives which incur high space over-head since they keep all of the input messages untilthey are consumed by the user-defined code [5, 8, 19].

4. G*’s processing primitives effectively hide the com-plexity of dealing with multiple graphs. In the 50 linesof code from Figures 13, 15, 16, 17 and 18, only line5 in Figure 16, line 7 in Figure 17, and line 11 in Fig-ure 18 reveal the presence of multiple graphs.

5. EXPERIMENTAL RESULTSThis section presents experimental results that we have

obtained by running G* on a 64 core server cluster. In thiscluster, each of eight nodes has two Quad-Core Xeon E54302.67 GHz CPUs, 16GB RAM and a 2TB hard drive.

Table 3 summarizes the datasets used for the experiments.The Twitter dataset contains a subset of messages sent be-tween Twitter users during February 2012. This dataset wascollected using Twitter’s Gardenhose API [29]. The Yahoodataset contains communication records between end-usersin the Internet and Yahoo servers [30]. The Tree dataset was

10

40

80

120

160

200

240

280

320

360

400

1 2 4 7 11 16

siz

e (

1 M

B),

qu

ery

tim

e (

se

c)

number of graphs

Postgres (data size)Postgres (query time)

G* (data size)G* (index size)G* (query time)

G* (query time, pgi)PGI

Figure 19: G* vs Postgres

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10

qu

ery

tim

e (

se

c)

number of graphs

Phoebus (all graphs)Phoebus (last graph)

G* (all graphs)G* (all graphs, pgi)

G* (last graph)PGI

Figure 20: G* vs Phoebus

0

200

400

600

800

1000

1200

1400

1600

1800

2000

12 4 8 16 32 64

nu

mb

er

of

su

bg

rap

hs

number of graphs

Theoretic MaximumYahoo

TreeTwitter

Twitter (100%)Twitter (20%)

Twitter (4%)Twitter (0%)

num

ber

of V

L m

aps

Figure 21: Number of VL Maps

Twitter Yahoo Tree

# vertices 33M 112M 1B# of edges 62M 335M 1B# of records 78M 1.1B 1Bdata size 11GB 107.9GB 21GB

Table 3: Datasets

1 select graph, src,2 1.0*count(i)/count(distinct des)/(count(distinct des)-1)3 from edge natural left outer join (4 select edge.graph, edge.src, e.src i, e.des5 from edge join edge e6 on edge.graph = e.graph and edge.des = e.src7 ) t8 group by graph, src9 having count(distinct des) > 1

Figure 25: Clustering Coefficient in SQL

obtained by running a binary tree generator. Using each ofthe Twitter and Yahoo data sets, we constructed a series ofgraphs that are hourly snapshots of the underlying network.A graph in each series was constructed by first cloning theprevious graph (Section 3.3.3) and then adding new verticesand edges to the new graph. On these graphs, we ran thequeries discussed in Section 2.3 (Figure 5).

5.1 G* vs Prior Database and Graph SystemsFigure 19 shows results that highlight the superiority of

G* over Postgres [23], a relational database system. Theseresults were obtained by running Postgres and G* on iden-tical servers. We used a set of 16 cumulative graphs con-structed from the Twitter dataset. Each graph contained3,000 additional edges compared to its previous graph.Therefore, the last graph contained 48,000 edges. We addedthis relatively small number of edges due to the performancelimitations of Postgres. On these graphs, we computed theclustering coefficient of every vertex. Postgres used a tableedge(graph, src, des) to store information about theedges in the graphs, and ran the SQL query in Figure 25.4

4Lines 4-6 in Figure 25 find all of the 2-hop paths from src to desvia i. The left outer join on lines 3-7 then finds all of the edges be-tween the neighbor vertices i and des of vertex src while keepingall of the edges from src. Line 1 computes the ratio of the num-ber of edges between the neighbors of src (i.e., count(i)) to the

Figure 19 shows that Postgres has much higher storageoverhead than G* (compare “Postgres (data size)” and “G*(data size)”). The reason behind this difference is that Post-gres stores one record for every combination of an edge anda graph, whereas G* stores each vertex and its edges onlyonce regardless of the number of graphs that contain them.

In our measurement, both the data size and index size ofPostgres were nearly identical because the index covered allof the attributes (i.e., graph, src, des) that formed the pri-mary key. By comparing “Postgres (data size)”, which alsorepresents the index size of Postgres, with “G* (index size)”,we can see that the index size of G* is also much smallerthan that of Postgres. The reason for this difference is thatG*’s index contains <vertex ID, disk location> pairs in adeduplicated fashion.

In G*, the index size is in general much smaller thanthe amount of disk-resident graph data (compare “G* (in-dex size)” with “G* (data size)”). Therefore, the entiretyor a large fraction of the index can be kept in memory, en-abling fast data lookup. Furthermore, G* can process eachvertex once and then share the result across relevant graphs.The curves labeled “Postgres (query time)” and “G* (querytime)” in Figure 19 clearly show the performance benefit ofG* over Postgres.

To study the importance of sharing computations acrossgraphs, we examined another situation where G* con-structed one VL map for each graph, ignoring the common-alities among graphs (see “G* (query time, PGI)”). Figure 19shows that “G* (query time)” and “G* (query time, PGI)”are significantly different, whereas “G* (query time, PGI)”and “Postgres (query time)” are similar. This result showsthat G*’s ability to share computations across graphs is amain contributor to the superiority of G* over Postgres.

The next result compares G* with Pheobus, an open-source implementation of Pregel [19]. We used Pheobussince the original Pregel system was not publicly available.As in a previous work on Pregel [19], we performed single-source shortest distance queries on complete binary trees.One major difference is that we ran queries on multiple trees

maximum number of edges that could exist between the neighbors(i.e., count(distinct des)/(count(distinct des)-1)).

11

0

0.2

0.4

0.6

0.8

1

12 4 8 16 32 64

ind

ex

size

(n

orm

aliz

ed

)

graphs per compact graph index

TwitterTwitter (100%)

Twitter (20%)Twitter (4%)Twitter (0%)

graphs per CGI

Figure 22: Index Size

0

0.2

0.4

0.6

0.8

1

12 4 8 16 32 64

exe

cutio

n tim

es

(norm

aliz

ed)




quer

y tim

e (n

orm

aliz

ed)

graphs per CGI

Figure 23: Query Time (Degree)

0

0.2

0.4

0.6

0.8

1

12 4 8 16 32 64

exe

cutio

n tim

es

(norm

aliz

ed)




quer

y tim

e (n

orm

aliz

ed)

graphs per CGI

Figure 24: Q. Time (C. Coefficient)

(i.e., graphs) rather than a single a tree. This experimentused a series of 10 cumulative graphs, each of which con-tained 25,000 additional edges than its previous graph.

Figure 20 shows the result obtained by running Phoebusand G* on identical servers (refer to Section 5.3 for our re-sults on the scalability of G*). In this result, “Phoebus (lastgraph)” and “G* (last graph)” represent the amount of timethat each system took to process the largest graph. This re-sult shows that G* substantially outperforms Phoebus evenwhen it processes a single graph. “Pheobus (all graphs)”and “G* (all graphs)” demonstrate that the performance dif-ference between Pheobus and G* becomes larger when theyprocess multiple graphs since G* can share computationsacross graphs, in contrast with Phoebus. When we disabledshared computation by using the PGI (Section 3.3.1) insteadof the CGI, the overall query time increased significantly(“G* (all graphs, PGI)”), which points out the importanceof sharing computations across graphs.

5.2 Impact of IndexingThe CGI associates the IDs and disk locations of vertices

using VL maps (Section 3.3). Both the lookup and updateoverhead of the CGI increases with more VL maps. Fig-ure 21 shows how the number of VL maps varies as theindex covers more graphs. Using the Twitter and Yahoodatasets, we generated two series of 64 cumulative graphsthat are hourly snapshots of the underlying network. Onthese graphs which reflect practical situations, the number ofVL maps increased modestly (see “Yahoo” and “Twitter”)compared to the theoretic maximum (2N , which is the num-ber of all possible combinations of N graphs).

To carefully examine the impact of similarity betweengraphs on the number of VL maps, we also constructed otherartificial series of graphs, each of which has a certain fraction(e.g., 0%, 4%, 20%, 100%) of different edges compared toits previous graph. In Figure 21, “Twitter (100%)” showsthe result when the graphs had no commonality among themand therefore only one VL map was constructed for eachgraph. In Figure 21, the largest number of VL maps werecreated when 4% of the edges were changed in each graph.

As explained in Section 3.4, we can control the cost ofmanaging VL maps by grouping graphs and constructing

a separate index for each group of graphs. Figures 22, 23and 24 show how the index size and query time vary as thenumber of graphs managed by each index increases. Fig-ure 24 shows that the query execution time tends to decreaseand then increase (compare the query execution times on“Twitter (20%)” at 16 graphs per CGI and 64 graphs perCGI) as more graphs are indexed together. The reason is thatthe cost of managing VL maps increases with more graphs.

Our index splitting method (Section 3.4) can automati-cally determine the number of graphs that each CGI needsto cover. In Figures 23 and 24, the arrows indicate the ac-tual query execution times that our technique achieved ondifferent sets of graphs. This result shows that our indexsplitting technique makes reliable decisions while achiev-ing near optimal performance. From Figures 23 and 24,we can also see that, despite its inherent design for shareddata storage and computation, G* does not pay any notice-able penalty when there is no commonality between graphs(see “Twitter (100%)”). In this case, the CGI naturallyconverges to the PGI, which ignores commonalities amonggraphs and therefore keeps one VL map for each graph.

5.3 ScalabilityThis section shows results on the scalability of G*. These

results were obtained by running the first three queries inFigure 5 on a varying number of G* servers. Among thethree datasets in Table 3, only the Tree dataset was used forthe shortest distance query. In this case, the root of the treewas selected as the source vertex. When we used other datasets, it was hard to obtain reliable results since the number ofvisited vertices varied significantly depending on the choiceof source vertex.

In one set of experiments (Figure 26), we increased boththe number of servers and the size of the graph at the samerate. When 64 G* servers were used, the largest graph foreach dataset was created using the entirety of the dataset.Figure 26 shows that G* achieved the highest level of scala-bility when it executed the vertex degree query. A main rea-son is that vertex degrees can be computed locally on eachserver without causing network overhead. Furthermore, thenumber of distinct vertices tends to increase slowly com-pared to the number of distinct edges. Therefore, each server

12

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

1 2 4 8 16 32 64

number of G* servers (relative graph data size)

coefficientdegree

quer

y tim

e (s

ec)

(a) Twitter

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

1 2 4 8 16 32 64


coefficientdegree

quer

y tim

e (s

ec)

(b) Yahoo

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

1 2 4 8 16 32 64


coefficientdegree

distance

quer

y tim

e (s

ec)

(c) TreeFigure 26: Scaleup Result

1

2

4

8

16

32

64

128

256

512

1 2 4 8 16 32 64

run

tim

e (

seco

nd

s)

number of G* servers

coefficientdegree

quer

y tim

e (s

ec)

(a) Twitter

1

2

4

8

16

32

64

128

256

512

1 2 4 8 16 32 64

run

tim

e (

seco

nd

s)


coefficientdegree

quer

y tim

e (s

ec)

(b) Yahoo

1

2

4

8

16

32

64

128

256

512

1 2 4 8 16 32 64

run

tim

e (

seco

nd

s)


coefficientdegree

distance

quer

y tim

e (s

ec)

(c) TreeFigure 27: Speedup Result

is assigned a relatively small number of vertices as the num-ber of edges increases in proportion to the number of servers.

When G* ran the clustering coefficient query, the querytime increased gradually as both the number of servers andthe amount of data increased. We conjecture that this phe-nomenon was mainly caused by the increase in the networktraffic. We plan to address this issue (i.e., further improvethe scalability of G*) by developing a data distribution tech-nique that can reduce the edges whose end points are on dif-ferent servers. The result on the shortest distance query issimilar to that on the clustering coefficient query.

Figure 27 shows the result obtained by creating a graphthat contains one million edges and then distributing thegraph over an increasing number of servers. In this case,the execution time of each query decreased as more serverswere used. The impact of each query on the query executiontime is consistent with that in the previous “scaleup” exper-iment.

6. RELATED WORKGoogle’s Pregel is a recent parallel graph processing sys-

tem [19]. A Pregel program includes a user-defined func-tion that specifies a task for each vertex in a superstep. Us-ing this function, Pregel can execute a graph algorithm suchas PageRank and shortest paths algorithms on a server clus-ter. Pregel achieves a higher level of scalability compared toprevious graph processing systems such as Parallel BGL [8]and CGMGraph [5]. Several open-source versions of Pregelare under active development, one of which, Phoebus [22],is compared with G* in Section 5. Other recent parallel

graph processing systems include Trinity [28], Surfer [6],Angrapa [27], and Pegasus [13]. In contrast to these systemsthat can process one graph at a time, G* can efficiently ex-ecute sophisticated queries on multiple graphs. The perfor-mance benefits of G* over the traditional graph processingsystems are experimentally demonstrated in Section 5.1.

Current graph compression techniques typically store asingle graph by either assigning short encodings to popu-lar vertices [26] or using reference compression. Referencecompression refers to an approach that represents an adja-cency list i using a bit vector which references a similar ad-jacency list j and a separate collection of elements neededto construct i from j [1, 4]. The above techniques and otherprevious techniques for compressing graphs [20] and binaryrelations [3] are not well suited for the target applications ofG*. In particular, the above compression techniques requirereconstructing the original vertices and edges, which wouldslow down the system operation. G*’s storage and indexingmechanisms do not have these limitations but rather expeditequeries on graphs.

Researchers have developed various types of graph index-ing techniques. Han et al. provided a comprehensive sur-vey and evaluation studies on indexing techniques for pat-tern matching queries [9]. Jin et al. have recently presentedan efficient indexing technique for reachability queries [12]with detailed comparison to other related techniques. In con-trast to these techniques, G*’s indexing approach strives tominimize, with low update overhead, the size of the mappingfrom the vertex and graph IDs to the corresponding graphdata on disk. This technique also enables fast cloning of

13

large graphs, and allows G* to process each vertex and itsedges once and then share the result across relevant graphsto speed up queries on multiple graphs.

Previous studies on the evolution of dynamic networkswere discussed in Section 1 of this paper.

7. CONCLUSION AND FUTURE WORKG* is a new parallel system for managing a series of large

graphs that represent an evolving network at different pointsin time. This system achieves scalable, efficient storage ofgraphs by taking advantage of their commonalities. In G*,each server is assigned a subset of vertices and their outgoingedges. Each G* server keeps track of the variation of eachvertex and its edges over time through deduplicated version-ing on disk. Each server also maintains an index for quicklyfinding the disk location of a vertex and its edges given rel-evant vertex and graph IDs. Due to its space efficiency, thisindex generally fits in the memory and therefore enables fastlookups.

G* supports sophisticated queries on graphs using oper-ators that process data in parallel. G* provides processingprimitives that enable succinct implementation of these op-erators for solving graph problems. G* operators processeach vertex and its edges once and use the result across allrelevant graphs to accelerate multi-graph queries. We haveexperimentally demonstrated the above benefits of G* overtraditional database systems and state-of-the-art graph pro-cessing systems.

We are developing techniques for archiving high-volumegraph data streams on disk such that disk seeks are mini-mized during later query execution. We are also studyingthe problem of distributing graph data over multiple serversin a manner that minimizes network traffic during query exe-cution. In addition to the capability of sharing computationsacross multiple graphs, we intend to explore techniques fordoing the same across multiple queries. Furthermore, weobserve that many queries on large graphs inherently take along time. Therefore, we plan to study ways to accuratelyestimate both the completion time and the result of queries.Other future research items include developing techniquesfor efficiently masking failures as well as incorporating log-ging into G* in order to support recalling graphs from anypoint in the past.

8. REFERENCES[1] M. Adler and M. Mitzenmacher. Towards Compressing Web

Graphs. In DCC, pages 203–212, 2001.[2] A. M. Alashqur, S. Y. W. Su, and H. Lam. OQL: A Query

Language for Manipulating Object-oriented Databases. InVLDB, pages 433–442, 1989.

[3] J. Barbay, M. He, J. I. Munro, and S. S. Rao. SuccinctIndexes for Strings, Binary Relations and Multi-LabeledTrees. In SODA, pages 680–689, 2007.

[4] P. Boldi and S. Vigna. The Webgraph Framework I:Compression Techniques. In WWW, pages 595–602, 2004.

[5] A. Chan, F. K. H. A. Dehne, and R. Taylor.CGMGRAPH/CGMLIB: Implementing and Testing CGM

Graph Algorithms on PC Clusters and Shared MemoryMachines. IJHPCA, 19(1):81–97, 2005.

[6] R. Chen, X. Weng, B. He, and M. Yang. Large GraphProcessing in the Cloud. In SIGMOD, pages 1123–1126,2010.

[7] Facebook Newsroom. http://newsroom.fb.com/content/default.aspx?NewsAreaId=22.

[8] D. Gregor and A. Lumsdaine. The Parallel BGL: A genericlibrary for distributed graph computations. In POOSC, 2005.

[9] W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. igraph: Aframework for comparisons of disk-based graph indexingtechniques. PVLDB, 3(1):449–459, 2010.

[10] H. He and A. K. Singh. Graphs-at-a-time: query languageand access methods for graph databases. In SIGMOD, pages405–418, 2008.

[11] Java Remote Method Invocation (RMI).http://download.oracle.com/javase/tutorial/rmi/index.html.

[12] R. Jin, N. Ruan, S. Dey, and J. X. Yu. Scarab: scalingreachability computation on large graphs. In SIGMOD, pages169–180, 2012.

[13] U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: Apeta-scale graph mining system. In ICDM, pages 229–238,2009.

[14] G. Kossinets and D. J. Watts. Empirical Analysis of anEvolving Social Network. Science, 311(5757):88–90, 2006.

[15] C. J. Kuhlman, V. S. A. Kumar, M. V. Marathe, S. S. Ravi,and D. J. Rosenkrantz. Finding critical nodes for inhibitingdiffusion of complex contagions in social networks. InPKDD, pages 111–127, 2010.

[16] R. Kumar, J. Novak, and A. Tomkins. Structure and evolutionof online social networks. In KDD, pages 611–617, 2006.

[17] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins.Microscopic Evolution of Social Networks. In KDD, pages462–470, 2008.

[18] J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs overTime: Densification Laws, Shrinking diameters and PossibleExplanations. In KDD, pages 177–187, 2005.

[19] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert,I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system forlarge-scale graph processing. In SIGMOD, pages 135–146,2010.

[20] S. Navlakha, R. Rastogi, and N. Shrivastava. Graphsummarization with bounded error. In SIGMOD, pages419–432, 2008.

[21] Neo4j the graph database. http://neo4j.org/.[22] Phoebus. https://github.com/xslogic/phoebus.[23] PostgreSQL 9.0. http://www.postgresql.org/.[24] C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On Querying

Historical Evolving Graph Sequences. PVLDB,4(11):726–737, 2011.

[25] Stanford Large Network Dataset Collection.http://snap.stanford.edu/data/.

[26] T. Suel and J. Yuan. Compressing the graph structure of theweb. In DCC, pages 213–222, 2001.

[27] The Angrapa package.http://wiki.apache.org/hama/GraphPackage.

[28] Trinity. http://research.microsoft.com/en-us/projects/trinity/.

[29] Twitter Streaming API. https://dev.twitter.com/docs/streaming-api/methods.

[30] Yahoo! Network Flows Data. http://webscope.sandbox.yahoo.com/catalog.php?datatype=g.

[31] P. Zhao and J. Han. On graph query optimization in largenetworks. PVLDB, 3(1):340–351, 2010.

14

Date post:	03-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

G*: A Parallel System for Efﬁciently Managing Large Graphs ... · This paper presents a new...

Documents