+ All Categories
Home > Documents > Cost Efficient Large-Scale Graph Analytics · 2015. 3. 17. · of graph algorithms and flexibility...

Cost Efficient Large-Scale Graph Analytics · 2015. 3. 17. · of graph algorithms and flexibility...

Date post: 29-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
23
Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved. Cost Efficient Large-Scale Graph Analytics Dr. Joseph Schneible
Transcript
  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    Cost Efficient Large-Scale Graph Analytics

    Dr. Joseph Schneible

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    2

    Applications of Graph Analysis

    ■ Social Networks

    ■ WWW

    ■ Medicine

    ■ Natural Language

    ■ Cybersecurity

    ■ Homeland Security

    ■ Local Government

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    OUTLINE:

    Performance

    System Design

    Graph Analysis

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    OUTLINE:

    Performance

    System Design

    Graph Analysis

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    5

    How to bring meaning to this?

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    6

    Example Algorithms

    PageRank Find Influential Nodes Within a Network

    Community Detection Find Dense Sub-graphs

    Belief Propagation Perform Inference on a Graph

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    7

    PageRank

    ■ PageRank trends linearly with degree

    ■ Anomalous nodes are above this trend line

    ■ Used to find mastermind of 9/11 attacks

    ■ Can be applied to biological networks, etc

    Degree

    Page

    Ran

    k

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    OUTLINE:

    Performance

    System Design

    Graph Analysis

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    9

    Goals

    Affordability

    Time Efficiency

    Customizability

    Meaning

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    10

    System Approach

    ■ All of these are affected by design choices:

    GPU Utilization

    Memory Efficiency

    Commodity Hardware

    I/O Efficiency

    Graph Construction

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    11

    Challenges

    Parallelization Memory Limitations

    Irregular Graph Structure

    Edges or Vertices? GB of RAM and vRAM

    TB Graphs

    Many Nodes with Few Connections

    Few Nodes with Many Connections

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    12

    Parallelization Strategies

    ■ Edge-wise Distribution: One Operation per Edge

    Memory for Temporary Data Structures

    Even Load Balance

    ■ Vertex-wise Distribution:

    Multiple Operations per Vertex

    Uneven Load Balance

    1

    27

    5

    4

    6

    3

    8

    Threads

    432 651 87

    E

    D

    A

    C

    B1

    27

    5

    4

    6

    3

    8

    Threads

    432

    6

    51

    8

    7

    A B C D E

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    13

    Load Balancing Graph Analysis on the GPU

    ■ High degree vertices will dominate computation time

    ■ Created multiple kernels

    ■ Threshold between high and low degree

    Load Balancing

    Time

    v4v3

    v5

    v2

    v6

    v8 v10v9

    v7

    e3e2

    e4

    e1

    e5

    e7 e9e8

    e6

    v2 – v10v1

    Multiple Kernels

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    14

    Out-of-Core Graph Processing

    ■ Divide graph into intervals (sets) of vertices

    ■ Gather associated in-edges into a shard

    ■ Order edges in shards such that out-edges are located together in windows

    Interval 4Interval 3Interval 2Interval 1

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    15

    Compression

    ■ Power-law distribution is common in natural graphs

    ■ Compression scheme exploits distribution

    Destination Vertices So

    urc

    e V

    erti

    ces

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    16

    Task Analysis

    ■ Use: Graph Meta-data

    Performance Models

    Micro-benchmarks

    ■ To: Divide work between CPU and GPU

    Divide work between kernels

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    OUTLINE:

    Performance

    System Design

    Graph Analysis

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    18

    PageRank: 5 Iterations

    LiveJournal Graph: Vertices: 4.6 Million

    Edges: 77.4 Million

    PageRank Performance

    FUNL Desktop System: GPU — GeForce GTX TITAN, 2688 CUDA

    Cores, 928MHz, 6GB vRAM

    CPU — Core i7, Quad Core, 3.40GHz

    RAM — 16GB (4x4GB), 1333MHz

    Storage — HDD, 180MB/s

    Spark Cluster: System: AWS EC2 m1.large

    Number of Nodes: 10

    Network: Moderate Performance

    9.5 Seconds

    110.4 Seconds

    FUNL Desktop System 9.5 Seconds

    Spark Cluster 110.4 Seconds

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    19

    Belief Propagation Performance

    FUNL/Quad Core CPU: GPU — GeForce GTX TITAN, 2688 CUDA

    Cores, 928MHz, 6GB vRAM

    CPU — Core i7, Quad Core, 3.40GHz

    RAM — 16GB (4x4GB), 1333MHz

    Storage — HDD, 180MB/s

    16 Core Server: CPU — Xeon E5-2690, 16 Cores, 2.9GHz

    RAM — 64GB, 1600MHz

    Storage — HDDx6, RAID0, 690MB/s

    - https://github.com/GraphChi

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    Additional Slides

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    21

    Advantages

    ■ Unique Features: Large scale Graph Analysis on the GPU Task Analysis for Efficient Parallel Processing (on CPU’s and GPU’s) UI and Interactive Visualization to bring Meaning to Big Data

    ■ Benefits: Big Data Graph Analysis on a Budget Customizability Ease of Use (You don’t have to be a data scientist) Reduction in Infrastructure and Energy Needs

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    22

    Data-center-friendly appliance with a suite of graph algorithms and flexibility to add custom solutions.

    Purpose-built to solve big data graph problems with commodity hardware.

    Graph analytics solution that supports pattern discovery and inferencing on large scale data sets.

    Big Data Appliance for Graph Analytics

    Gain insight by discovering unknown relationships in big data.

    Achieve a competitive advantage without a large budget.

    Ease adoption with a small footprint solution and customizability.

  • Technica Corporation Confidential and Proprietary. Copyright © 2015 Technica Corporation. All Rights Reserved.

    23

    Point of Contact

    Joe Schneible

    Enterprise Software Solutions Engineering Group Manager Email: [email protected]

    LinkedIn: linkedin.com/in/jschneible

    Technica Corporation 22970 Indian Creek Dr., Suite 500 Dulles, VA 20166

    703.662.2000 technicacorp.com


Recommended