+ All Categories
Home > Documents > GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage...

GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage...

Date post: 29-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
46
SC 2019
Transcript
Page 1: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

SC 2019

Page 2: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing

Jin Zhao1, Yu Zhang1, Xiaofei Liao1, Ligang He2, Bingsheng He3, Hai Jin1, Haikun Liu1, Yicheng Chen1

1 SCTS/CGCL, Huazhong University of Science and Technology, China

2 University of Warwick, UK3 National University of Singapore, Singapore

Page 3: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Background and Challenges

• GraphM

• Experimental Results

• Conclusion

Outline

Page 4: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Social network Internet Road network

• Graph processing → rapidly growing demand in the real world

Graph Processing

• Graphs → ubiquitously preferred data representation

35

30

25

20

15

10

5

00 20 40 60 80 100 120 140 160N

um

ber

of

con

curr

ent

job

s

Times(hours)

Number of jobs traced on a social network

More than 30 jobs are

concurrently executed on the

same platform at the peak time

Page 5: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Concurrent Graph Processing

• Many concurrent graph processing jobs are often handled on the same underlying

graph (or its different snapshots) to provide various information for different products

K-means SSSP

PageRank

Graph Processing Framework

Shared

Graph Data

Concurrent iterative

graph processing jobs

Page 6: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

ChaosPowerGraph GraphChi …GridGraph

Existing Graph Processing Systems

• Higher sequential memory bandwidth

• Better data locality

• Smaller memory consumption

• Fewer redundant data accesses

Mainly designed to efficiently handle single graph processing job

Page 7: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

ChaosPowerGraph GraphChi …GridGraph

Existing Graph Processing Systems

• Higher sequential memory bandwidth

• Better data locality

• Smaller memory consumption

• Fewer redundant data accesses

Mainly designed to efficiently handle single graph processing job

Graph

copy

Graph

copy

Graph

copy

Memory

Cache

Job-specific data

Job-specificdata

Job-specific data

Job 1 Job 2 Job 3

Secondary storage

Graph

data

Execution of concurrent graph processing jobs on existing systems

Page 8: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Reason 1: A lot of redundant consumption of

memory resources and data access channels

0

2

4

6

8

1 2 4 8

Mem

ory

usa

ge

(GB

)

Number of concurrent jobs

PagerankWCCBFSSSSP

(a) Total memory usage

0

2

4

6

8

1 2 4 8

Mem

ory

usa

ge

(GB

)

Number of concurrent jobs

PagerankWCCBFSSSSP

(a) Total memory usage

0

20

40

60

80

100

120

1 2 4 8

LL

C m

isse

s (B

illi

ons)

Number of concurrent jobs

PageRankWCCBFSSSSP

(b) Total last-level cache misses

Mem

ory

usa

ge(G

B)

LL

C m

isse

s (B

illions)

Challenges: Redundant Data Access Overhead

The memory usage and the total amount of graph data loaded into LLC

increases when more concurrent jobs executed on the same platform

Page 9: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

The LLC misses per instruction (LPI) and average execution time of each job

increases when more concurrent jobs executed on the same platform

Challenges: Redundant Data Access Overhead

0

100

200

300

400

500

600

1 2 4 8Number of concurrent jobs

PageRankWCCBFSSSSP

(d) Average execution time

0.006

0.007

0.008

0.009

0.010

1 2 4 8

Num

ber

of

LP

I

Number of concurrent jobs(c) Average number of LPI

PageRankWCCBFSSSSP

Ex

ecu

tio

n t

ime

(Sec

on

ds)

Nu

mb

er o

f L

PI

Reason 2: Serious contention for storage

resources and data access channels

Page 10: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Observations

5

6

7

8

9

1 2 3 4 5 6

Aver

age

dat

a ac

cess

tim

es

Time (hours)(b) Average data access times

60

70

80

90

100

1 2 3 4 5 6Time (hours)

(a) Percentage of shared graph

#>1 #>2 #>4 #>8

Perc

enta

ge

share

d b

y #

job

s (%

)A

ver

age d

ata a

cces

s ti

mes

Spatial Similarity

Most proportion of the same graph can be shared

by multiple concurrent jobs during the traversals

Temporal Similarity

Information traced on the social network

Page 11: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Observations

5

6

7

8

9

1 2 3 4 5 6

Aver

age

dat

a ac

cess

tim

es

Time (hours)(b) Average data access times

60

70

80

90

100

1 2 3 4 5 6Time (hours)

(a) Percentage of shared graph

#>1 #>2 #>4 #>8

Perc

enta

ge

share

d b

y #

job

s (%

)A

ver

age d

ata a

cces

s ti

mes

Spatial Similarity

Most proportion of the same graph can be shared

by multiple concurrent jobs during the traversals

Temporal Similarity

The same graph data may be repeatedly accessed

by different concurrent jobs over a period of time

Information traced on the social network

Page 12: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Motivations

Spatial Similarity

Maintain a single copy of the same graph structure data in the storage to

serve the concurrent jobs

➢ How to utilize the spatial/temporal similarities?

Temporal Similarity

Page 13: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Motivations

Spatial Similarity

Maintain a single copy of the same graph structure data in the storage to

serve the concurrent jobs

➢ How to utilize the spatial/temporal similarities?

Temporal Similarity

Consolidate the accesses to the same graph structure data for

concurrent jobs

Page 14: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Motivations

➢ How to utilize the spatial/temporal similarities in a practical way?

Option #1: Design a graph processing framework

• New programming model for graph algorithms

• Requires changes in user-level applications

Page 15: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Motivations

➢ How to utilize the spatial/temporal similarities in a practical way?

Option #2: Develop a graph storage system

• Several APIs for existing graph processing frameworks

• Can be transparent to application programmers

Option #1: Design a graph processing framework

• New programming model for graph algorithms

• Requires changes in user-level applications

Page 16: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Background and Challenges

• GraphM

• Experimental Results

• Conclusion

Outline

Page 17: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

GraphM Explores…

• Traditional graph storage approach

D1 = (V1, E1, W1, S1)

D2 = (V2, E2, W2, S2)

DJ = (VJ, EJ, WJ, SJ )

• Main goals

D1 = (V1, E1, W1, S1)

D2 = (V2, E2, W2, S2)

DJ = (VJ, EJ, WJ, SJ )

17

Most graph structure data G=(V, E, W) is the same for different

concurrent graph processing jobs

V, E, WThe storage of the same graph structure data and the data access

to it can be shared by different concurrent graph processing jobs

Page 18: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

GraphM Explores…

• The challenges of utilizing the similarities

- Concurrent jobs access the shared partitions in an

individual manner along different graph paths

- The processing time of each graph structure partition

is various for different jobs

• Our expectations

- Load the shared graph partitions along a common

order for concurrent jobs

- Take into account the temporal similarities when

loading the shared graph partitions

P2 P3

P1 P2

P1 P2 P3 P4

Job 1

Job 2

Job 3

TimeAn Iteration of Graph Processing

Iteration n1 for Job 1

Iteration n2 for Job 2

Iteration n3 for Job 3

P4

P2 P3

P2 P1 P4

P1 P2 P3 P4

Job 1

Job 2

Job 3

TimeAn Iteration of Graph Processing

Iteration n1 for Job 1

Iteration n2 for Job 2

Iteration n3 for Job 3

Page 19: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Overview of GraphM

We design a structure-aware graph labelling scheme

We propose a Sharing-Synchronize mechanism

We develop a graph partition loading strategy

We enable GraphM without user-application change

Page 20: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Partition Labeling

Original graph

(v0, v1)

(v0, v2)

(v3, v1)

(v4, v2)

1

3

5

0

4

2P1

P2

P3

P4

(v0, v3)

(v0, v5)

(v2, v4)

(v2, v5)

(v5, v3)

(v5, v4)

Graph representation

format specific to GridGraph

Page 21: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Partition Labeling

• Graph partitions are traversed once and logically labeled as chunks

Original graph

(v0, v1)

(v0, v2)

(v3, v1)

(v4, v2)

1

3

5

0

4

2P1

P2

P3

P4

(v0, v3)

(v0, v5)

(v2, v4)

(v2, v5)

(v5, v3)

(v5, v4)

Each chunk consists of two edges

Graph representation

format specific to GridGraph

Page 22: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Partition Labeling

• Graph partitions are traversed once and logically labeled as chunks

Original graphGraph representation

format specific to GridGraph

(v0, v1)

(v0, v2)

(v3, v1)

(v4, v2)

1

3

5

0

4

2P1

P2

P3

P4

(v0, v3)

(v0, v5)

(v2, v4)

(v2, v5)

(v5, v3)

(v5, v4)

Each chunk consists of two edges

<v0, 2><v3, 1><v4, 1> <v0, 2> … Chunk tables

ID of source vertex

Number of it’s

outgoing edges

Chunk 1 Chunk 2 Chunk 3

Page 23: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Share-Synchronize Mechanism

G = {P1, P2, P3, P4}Disk

P3

S1 S2…

Load of graph structure data

S3

Memory

Suspend ones Executable ones

Job 1 Job 2 Job 3…

• Memory Sharing of Graph Structure (Sharing())

- Load an assigned active partition

- Resume or suspend corresponding jobs

- Share the graph structure data

Page 24: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Share-Synchronize Mechanism

G = {P1, P2, P3, P4}Disk

P3

S1 S2…

Load of graph structure data

S3

Memory

• The amount of edges that need to be

processed is different

• The computational complexity of the

edge processing function is different

e1P3

e2 e3 e4 Job 1

Job 3e1P3

e2 e3 e4

Suspend ones Executable ones

Job 1 Job 2 Job 3…

• Memory Sharing of Graph Structure (Sharing())

- Load an assigned active partition

- Resume or suspend corresponding jobs

- Share the graph structure data

Page 25: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Share-Synchronize Mechanism

• Memory Sharing of Graph Structure (Sharing())

- Load an assigned active partition

- Resume or suspend corresponding jobs

- Share the graph structure data

• Fine-grained Synchronization

- Profiling phase

- Obtain T(E), T(Fj)

- Syncing phase

G = {P1, P2, P3, P4}Disk

P3

S1 S2…

Load of graph structure data

S3

Memory

Suspend ones Executable ones

Job 1 Job 2 Job 3…

T(E): The average data access time for each edge

T(Fj):The computational complexity of the edge processing

function of job j

Page 26: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Share-Synchronize Mechanism

• Memory Sharing of Graph Structure (Sharing())

- Load an assigned active partition

- Resume or suspend corresponding jobs

- Share the graph structure data

• Fine-grained Synchronization

- Profiling phase

- Obtain T(E), T(Fj)

- Syncing phase

- Acquire workloads of each chunk

- Unevenly allocate CPU resources

G = {P1, P2, P3, P4}Disk

P3

S1 S2…

Load of graph structure data

S3

Memory

CacheChunk

Regular streaming of chunks

Job 1 Job 2 Job 3…

Suspend ones Executable ones

T(E): The average data access time for each edge

T(Fj):The computational complexity of the edge processing

function of job j

Page 27: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Ensuring of Consistent Snapshots

• The mutations (by some jobs) and updates (over time) of the shared graph structure

data are isolated among concurrent jobs to ensure the correctness of the processing

Phys ical address Virtual address

of Job 2

Shared memory

Mutation 2

Chunk 1

Virtual address of Job 1

copy

copy

low

high

Graph Structureof Job 1

Shared Graph Structure

Update 3

Graph Structureof Job 2

Copy 3 Chunk 4

Mutation 2

Chunk 4

Chunk 3Chunk 2

Chunk 1Chunk 4

Update 3

Chunk 2

Chunk 1

Job 1 is submitted before Job 2, Chunk 3 is updated after Job 1 is submitted, and Chunk 2 is modified by Job 2

Page 28: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Scheduling Strategy for Partition Loading

• The partitions are given a higher priority

- when they are handled by the jobs with fewer active partitions

- when they are processed by more jobs

Partition 1 is activated by the other partitions of job 1 in 𝑥𝑡ℎ iteration and can be handled at the (𝑥 + 1)𝑡ℎ

iteration for job 1

Page 29: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

System Architecture

OS

Our Graph Storage System

Existing Graph Processing Engine

Graph API

Existing Graph Processing Framework

User

Application

User

Application

Original graph data

Chunk

tables

LLCChunk 1

Graph Partition

Chunk 1 Chunk 2

Chunk 3 Chunk 4

Graph Partition

Chunk 1 Chunk 2

Chunk 3 Chunk 4

Graph partition

Chunk 1 Chunk 2

Chunk 3 Chunk 4

Specific graph

representation

Chunk 1Chunk 1

CP

UD

isk

Mem

ory

Graph preprocessor

Graph sharing controller

Synchronization manager

User Application

Chunk

tables

User Application

Page 30: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Integrated with Existing Framework

No burden on programmers + Minor framework change

An example to illustrate how to integrate GraphM into existing graph processing framework

GraphM.Init() /*Initialization of GraphM*/StreamEdges(){ /*Setup the active partitions*/ GraphM.GetActiveVertices() for(each active partition){ partition GraphM.Sharing(G, load()) /*Notify GraphM to start synchronization*/ GraphM.Start() for(each edge partition) /*Process the streamed edges*/ /*Notify GraphM to end synchronization*/ GraphM.Barrier() }}

/*Edge streaming function in GridGraph*/

StreamEdges(){

/*Setup the active partitions*/

for(each active partition){

/* The original data load operation*/

partition load()

for(each edge partition)

/*Process the streamed edges*/

}

}

(a) Pseudocode of GridGraph (b) Pseudocode of GridGraph integrated with GraphM

Page 31: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Background and Challenges

• GraphM

• Experimental Results

• Conclusion

Outline

Page 32: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Machine information

- CPU: 2-way 8-core Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz

- each CPU has 20 MB Last-Level Cache

- Main Memory: 32GB

• Typical graph processing algorithms

- PageRank, WCC, BFS, SSSP

• Datasets

- 5 real world datasets

• Evaluated graph processing systems

- GraphChi, GridGraph, PowerGraph, Chaos

Datasets Vertices Edges Data sizes

LiveJ 4.8 M 69 M 526 MB

Orkut 3.1 M 117.2 M 894 MB

Twitter 41.7 M 1.5 B 10.9 GB

UK-union 133.6 M 5.5 B 40.1 GB

Clueweb12 978.4 M 42.6 B 317 GB

Experiment Setup

Properties of data sets

< 32 GB

> 32 GB

Page 33: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Machine information

- CPU: 2-way 8-core Intel(R) Xeon(R) CPU E5-2670 @ 2.60GHz

- each CPU has 20 MB Last-Level Cache

- Main Memory: 32GB

• Typical graph processing algorithms

- PageRank, WCC, BFS, SSSP

• Datasets

- 5 real world datasets

• Evaluated graph processing systems

- GraphChi, GridGraph, PowerGraph, Chaos

Experiment Setup

GridGraph-M GridGraph-S GridGraph-C

Datasets Vertices Edges Data sizes

LiveJ 4.8 M 69 M 526 MB

Orkut 3.1 M 117.2 M 894 MB

Twitter 41.7 M 1.5 B 10.9 GB

UK-union 133.6 M 5.5 B 40.1 GB

Clueweb12 978.4 M 42.6 B 317 GB

Properties of data sets

< 32 GB

> 32 GB

Page 34: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Overall Performance

LiveJ Orkut Twitter UK-union Clueweb0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Norm

aliz

ed e

xec

uti

on t

ime

Data sets

GridGraph-S GridGraph-C GridGraph-M

LiveJ Orkut Twitter UK-union Clueweb0.0

0.4

0.8

1.2

1.6

2.0

2.4

2.8

3.2

Gri

dG

raph

-MG

ridG

raph

-C

Tim

e co

nsu

mpti

on r

atio

Data sets

Graph processing time Data accessing time

Gri

dG

raph

-S

Gri

dG

raph

-S

Gri

dG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Total execution time for the 16 jobs with

different schemes

Execution time breakdown of jobs with

different schemes

• Shorter total execution time

• Much lower graph data accessing cost

Page 35: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Overall Performance

LiveJ Orkut Twitter UK-union Clueweb0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Norm

aliz

ed e

xec

uti

on t

ime

Data sets

GridGraph-S GridGraph-C GridGraph-M

LiveJ Orkut Twitter UK-union Clueweb0.0

0.4

0.8

1.2

1.6

2.0

2.4

2.8

3.2

Gri

dG

raph

-MG

ridG

raph

-C

Tim

e co

nsu

mpti

on r

atio

Data sets

Graph processing time Data accessing time

Gri

dG

raph

-S

Gri

dG

raph

-S

Gri

dG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Gri

dG

raph

-SG

ridG

raph

-CG

ridG

raph

-M

Total execution time for the 16 jobs with

different schemes

Execution time breakdown of jobs with

different schemes

• Shorter total execution time

• Much lower graph data accessing cost

The data accessing time is

reduced by 11.48 times and

13.06 times

Page 36: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Volume of Data Access

Total I/O overhead for 16 jobs with

different schemes

Volume of data swapped into the LLC for

16 jobs with different schemes

LiveJ Orkut Twitter UK-union Clueweb0.0

0.2

0.4

0.6

0.8

1.0

1.2

Norm

aliz

ed V

olu

me

Data sets

GridGraph-S GridGraph-C GridGraph-M

LiveJ Orkut Twitter UK-union Clueweb0.0

0.2

0.4

0.6

0.8

1.0

1.2

No

rmal

ized

I/O

ov

erh

ead

Data sets

GridGraph-S GridGraph-C GridGraph-M

• Less volume of data access

• Reduce a mass of I/O overhead in the case of out-of-core processing

Page 37: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Scalability

Total execution time for different

number of jobs

Total execution time on different

number of CPU cores

1 2 4 8 160

100

200

300

400

500

Exec

uti

on t

ime

(hours

)

Number of jobs

GridGraph-S GridGraph-C GridGraph-M

1 2 4 8 160.0

0.2

0.4

0.6

0.8

Exec

uti

on t

ime

(hours

)

Number of CPU cores

GridGraph-S GridGraph-C GridGraph-M

• Better speedup is achieved when the number of jobs increases

• Simply adopting original frameworks to support concurrent jobs may be a terrible choice

Page 38: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Integration with Other Frameworks

Execution time (in seconds) of 64 jobs for different frameworks, where “—”

means it cannot be executed for memory errors. PowerGraph and Chaos are done

on a cluster with 128 nodes, which is connected via 1-Gigabit Ethernet.

LiveJ Orkut Twitter UK-union Clueweb12

GraphChi-S 2,348 2,248 43,032 149,352 > 1 week

GraphChi-C 776 696 10,580 38,760 > 1 week

GraphChi-M 344 468 6,128 12,436 248,840

PowerGraph-S 92 144 1,408 7,183 —

PowerGraph-C 83 111 1,153 6,653 —

PowerGraph-M 43 75 795 3,820 —

Chaos-S 224 159 4,668 29,538 487,272

Chaos-C 516 588 12,011 30,943 > 1 week

Chaos-M 121 106 2,261 10,614 156,881

Page 39: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Pre-processing → format converting, graph partition labelling

• Result → small extra overhead

- Can be amortized by reuse

Evaluation: Pre-processing Cost

LiveJ Orkut Twitter UK-union Clueweb12

Extra Size 70.6 MB 49.2 MB 2.09 GB 4.5 GB 19.9 GB

Extra Ratio 13.4% 5.5% 19.2% 11.2% 6.3%

LiveJ Orkut Twitter UK-union Clueweb12

GridGraph 20.89 35.07 439.59 2,312.11 19,267.28

GridGraph-M 21.86 35.76 463.65 2,681.04 22,401.90

Extra Ratio 4.6% 2.0% 5.5% 16.0% 16.3%

Preprocessing time (in seconds)

Extra storage cost

Page 40: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

• Background and Challenges

• GraphM

• Experimental Results

• Conclusion

Outline

Page 41: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Conclusion

➢What GraphM brings in graph processing

• Analysis of spatial/temporal similarities between concurrent graph processing jobs

• A novel Share-Synchronize mechanism for concurrent graph processing

• A scheduling strategy for out-of-core graph processing

• Requires no application change and only minor change in framework

➢ Future work

• How to exploit the use of new hardware (e.g., FPGA even ASIC) to accelerate data accesses

of concurrent jobs for higher throughput

• How to further optimize GraphM for distributed platforms and for evolving graphs processing

• How to further ensure the security

Page 42: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

THANK YOU!

Service Computing Technology and System Lab., MoE (SCTS)

Cluster and Grid Computing Lab., Hubei Province (CGCL)

Page 43: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

BACKUP SLIDES

Page 44: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Fine-grained Synchronization Execution

• Profiling Phase

• Syncing Phase

Ci: The set of chunks in the partition Pi

Vk: The set of vertices in the kth chunk

Aj: The set of active vertices for the jth job

: The number of out-going edges of the vertex v in the kth chunk

Page 45: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Priorities of Graph Partitions

• Ji: The set of jobs to handle Pi in the next iteration

• Nj (P): The number of active partitions of the jth job

• N(J i ): The number of jobs in the set Ji

Page 46: GraphM sc'19sc19.supercomputing.org/proceedings/tech_paper/... · GraphM: An Efficient Storage System for High Throughput of Concurrent Graph Processing Jin Zhao1, Yu Zhang 1, Xiaofei

Evaluation: Scheduling Strategy

Total execution time for the 16 jobs without/with our scheduling

LiveJ Orkut Twitter UK-union Clueweb0.0

0.2

0.4

0.6

0.8

1.0

1.2

Norm

aliz

ed e

xec

uti

on

tim

e

Data sets

GridGraph-M-without GridGraph-M


Recommended