+ All Categories
Home > Documents > Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a...

Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a...

Date post: 25-Sep-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
32
Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper April 26, 2017 Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 1 / 21
Transcript
Page 1: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Mosaic: Processing a Trillion-Edge Graph on a SingleMachine

Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang,Mohan Kumar, Taesoo Kim

Georgia Institute of Technology

Best Student Paper

April 26, 2017

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 1 / 21

Page 2: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Large-scale graph processing is ubiquitous

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21

Social networks

Page 3: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Large-scale graph processing is ubiquitous

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21

Social networks

Genome analysis

Page 4: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Large-scale graph processing is ubiquitous

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21

Social networks

Genome analysis

Graphs enable Machine Learning

Page 5: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Powerful, heterogeneous machines

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21

Terabytes of RAM on multiplesockets

Page 6: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Powerful, heterogeneous machines

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21

Terabytes of RAM on multiplesockets

Powerful many-core coprocessors

Page 7: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Powerful, heterogeneous machines

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21

Terabytes of RAM on multiplesockets

Powerful many-core coprocessors

Fast, large-capacity Non-volatile Memory

Page 8: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Powerful, heterogeneous machines

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21

Terabytes of RAM on multiplesockets

Powerful many-core coprocessors

Fast, large-capacity Non-volatile Memory

Take advantage of heterogeneous machine to process tera-scale graphs

Page 9: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Table of contents

1 Graph Processing: Sample Application

2 DesignMosaic ArchitectureGraph EncodingAPI

3 Evaluation

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 4 / 21

Page 10: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Graph Processing: Applications

Community Detection

Find Common Friends

Find Shortest Paths

Estimate Impact of Vertices (webpages, users, . . . )

. . .

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 5 / 21

Page 11: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Mosaic: Design space

Graph Processing has many faces:

Single Machine

Out-of-coreIn memory

Cluster

Out-of-coreIn memory

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21

Page 12: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Mosaic: Design space

Graph Processing has many faces:

Single Machine

Out-of-core ⇒ Cheap, but potentially slowIn memory ⇒ Fast, but limited graph size

Cluster

Out-of-core ⇒ Large graphs, but expensive & slowIn memory ⇒ Large graphs & fast, but very expensive

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21

Page 13: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Mosaic: Design space

Graph Processing has many faces:

Single Machine

Out-of-core ⇒ Cheap, but potentially slowIn memory ⇒ Fast, but limited graph size

Cluster

Out-of-core ⇒ Large graphs, but expensive & slowIn memory ⇒ Large graphs & fast, but very expensive

⇒ Single machine, out-of-core is most cost-effective⇒ Goal: Good performance and large graphs!

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21

Page 14: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Mosaic: Design goals

Goal

Run algorithms on very large graphs on a single machine using coprocessors

Enabled by:

Common, familiar API (vertex/edge-centric)

Encoding: Lossless compression

Cache locality

Processing on isolated subgraphs

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 7 / 21

Page 15: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Architecture of Mosaic

Usage of Xeon Phi & NVMe

Involvement of Host

I1 I2

T2

...

...T1 ...edge

processing

NVMe Xeon Phi

...<current state> <next state>

Global vertex state

(×61 cores)

...

Tile transfer

Metatransfer(×6)

fetch receive

HostProcessors

(Xeon) per Xeon Phi(×4)

PCIe

...

... stripped

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 8 / 21

Page 16: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Graph encoding: Idea

Compression

Split graph into subgraphs, use local (short) identifiers

Cache locality

Inside subgraphs: Sort by access order

Between subgraphs: Overlap vertex sets

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 9 / 21

Page 17: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Background: Column first

Locality for writeMultiple sequential reads

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

Target vertex

Partition(S = 3)

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

⇒ Problem: No locality when switching columnSteffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 10 / 21

Page 18: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Background: Row first

Locality for readMultiple sequential writes

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

Target vertex

Partition(S = 3)

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

⇒ Problem: No locality when switching rowSteffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 11 / 21

Page 19: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Background: Hilbert order

Space-filling curveProvides locality between adjacent data points

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

Target vertex

Partition(S = 3)

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 12 / 21

Page 20: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

From global to local: Tiles

Convert graph to set of tiles

1) Start with adjacency Matrix:

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

Target vertex

➏➐

Partition(S = 3)

➒➌

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

➊➋

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21

Page 21: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

From global to local: Tiles

Convert graph to set of tiles

2) Use first edge in tile T1:

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

(global)

Target vertex (global)

➏➐

➍①

( ,1)( ,2)①②

Tile-1

meta

Partition

(local)

➊( ,1)①

: local vertex id: local → global id: local edge store order(S = 3)

(I1)

(T1)

➒➌

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

➊➋

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21

Page 22: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

From global to local: Tiles

Convert graph to set of tiles

3) Consume as many edges as possible:

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

(global)

Target vertex (global)

➏➐

➍①

( ,1)( ,2)

( ,5)( ,4)

①②

③④

Tile-1

meta

Partition

(local)

➊➋

➌➍

➊( ,1)①

: local vertex id: local → global id: local edge store order(S = 3)

(I1)

(T1)

➒➌

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

➊➋

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21

Page 23: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

From global to local: Tiles

Convert graph to set of tiles

4) Next edges do not fit in T1, construct T2:

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

(global)

Target vertex (global)

➏➐

➍①

( ,1)( ,2)

( ,5)( ,4)

①②

③④

Tile-1

meta

( ,4)( ,6)

( ,5)( ,3)

①②

③④

meta

Tile-2

Partition

(local)

(local)

➊➋

➌➍

➊( ,1)①

: local vertex id: local → global id: local edge store order(S = 3)

(I2)

(I1)

(T1)

(T2)

➌P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

➊➋

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21

Page 24: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Locality with Hilbert-ordered tiles

Overlapping sets of sources and targets

123456789

101112

1 2 3 4 5 6 7 8 9 10 11 12

Global adjacency matrix

Source vertex

Target vertex

➏➐

Partition(S = 3)

➒➌

P11 P12 P14P13

P21 P22 P24P23

P31 P32 P34P33

P41 P42 P44P43

⇒ Better locality than row-first or column-firstSteffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 14 / 21

Page 25: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

API: Pagerank example

Pull: Gather per edge information

Reduce: Combine results from multiple subgraphs

Apply: Calculate non-associative regularization

// On edge processor (co-processor)// Edge e = (Vertex src, Vertex tgt)def Pull(Vertex src, Vertex tgt): return src.val / src.out_degree

1234

Glo

bal g

raph

proc

essi

ng

Loc

al g

raph

pr

oces

sing

on

Til

eEdge-centric operation

Vertex-centric operation

// On edge processor/global reducers (both)def Reduce(Vertex v1, Vertex v2): return v1.val + v2.val

// On global reducers (host)def Apply(Vertex v): v.val = (1 - α) + α × v.val

567

89

10

Formula: Pagerankv = α ∗(∑

u∈Neighborhood(v)Pagerankudegreeu

)+ (1− α)

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 15 / 21

Page 26: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Evaluation: Preprocessing

Mosaic needs explicit preprocessing step

2-4 min for small datasets, 51 minutes for webgraph, 31 hours fortrillion edges

But: Can be amortized during execution:GridGraph: Mosaic faster after

twitter: 20 iterationsuk2007: 8 iterations

X-Stream: Mosaic faster after

twitter: 8 iterationsuk2007: 5 iterations

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 16 / 21

Page 27: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Evaluation: Size of datasets

Hilbert-ordered tiles allow efficient encoding of local graphs

Effect: up to 68% reduction in data size

Graph #vertices #edges Raw data Mosaic size (red.)

?rmat24 16.8 M 0.3 B 2.0 GB 1.1 GB (−45.0%)twitter 41.6 M 1.5 B 10.9 GB 7.7 GB (−29.4%)

?rmat27 134.2 M 2.1 B 16.0 GB 11.1 GB (−30.6%)uk2007-05 105.8 M 3.7 B 27.9 GB 8.7 GB (−68.8%)hyperlink14 1,724.6 M 64.4 B 480.0 GB 152.4 GB (−68.3%)

?rmat-trillion 4,294.9 M 1,000.0 B 8,000.0 GB 4,816.7 GB (−39.8%)

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 17 / 21

Page 28: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Hilbert-ordered tiles: Cache locality

Cache misses and execution times for three different strategies

0

20

40

60

80

100

Pagerank BFS WCC

Ca

che

Mis

ses

(%)

0

5

10

15

20

25

30

35

Pagerank BFS WCC

Ru

nti

me

(s)

HilbertRow-First

Column-First

⇒ Hilbert-ordered tiles have up to 45% better cache locality,up to 43% reduction in runtime

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 18 / 21

Page 29: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Performance comparison

Comparison to other single machine engines with Pagerank:

0

20

40

60

80

100

rmat24 twitter rmat27 uk2007-05

Ru

nti

me

(sec

on

ds)

MosaicGridGraphX-StreamGraphChi

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 19 / 21

Page 30: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Performance comparison

Comparison to other single machine engines with Pagerank:

0.1

1

10

100

rmat24 twitter rmat27 uk2007-05

log

Ru

nti

me

(sec

on

ds)

MosaicGridGraphX-StreamGraphChi

⇒ Mosaic outperforms other system by 2.7×to 58.6×

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 19 / 21

Page 31: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Conclusion

Mosaic, a graph processing engine for trillion edge graphs on a singlemachine

Hilbert-ordered tiles allow:

Enable localized processing on coprocessorsOptimizes cache localityEnables compression

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 20 / 21

Page 32: Mosaic: Processing a Trillion-Edge Graph on a Single Machinemosaic-slides.pdfMosaic: Processing a Trillion-Edge Graph on a Single Machine Ste en Maass, Changwoo Min, Sanidhya Kashyap,

Thank you!

Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 21 / 21


Recommended