+ All Categories
Home > Documents > Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for...

Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for...

Date post: 26-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
46
Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas Bouganis [email protected] FPL 2015, London 2 Sept 2015
Transcript
Page 1: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Towards Heterogeneous Solvers for Large-Scale Linear Systems

Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas Bouganis

[email protected]

FPL 2015, London

2 Sept 2015

Page 2: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Introduction – Solving Linear Systems

• Given 𝑨 ∈ 𝑅(𝑚×𝑛), 𝒃 ∈ 𝑅𝑚 and 𝑚 ≥ 𝑛:

min𝒙∈𝑅𝑛

𝑨𝒙 − 𝒃 22

• Find vector 𝒙

A Samples

Measurements

b Target Values

x

Weights

2

Page 3: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Introduction – Solving Linear Systems

• Given 𝑨 ∈ 𝑅(𝑚×𝑛), 𝒃 ∈ 𝑅𝑚 and 𝑚 ≥ 𝑛:

min𝒙∈𝑅𝑛

𝑨𝒙 − 𝒃 22

• Find vector 𝒙

A Samples

Measurements

b Target Values

x

Weights

Patients

DNA Nucleotides Biological Data

Phenotypes

2

Page 4: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Introduction – Solving Linear Systems

Linear Systems

Big Data Computing

Infrastructure

• Biology

• Wearables

• Internet

• Bioinformatics

• Econometrics

• Control

Feasibility

3

Page 5: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Introduction – Data Systems with different structure

A

Square

System

A

Tall-Skinny

System

• Many samples

(rows)

• A few

measurements

(columns)

Anything in between

𝑚 = 𝑛 𝑚 ≫ 𝑛

4

Page 6: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Introduction – Linear Systems in Genetic Analysis

• Real-life Example:

− Genetic Analysis [1]

− Search for gene combinations by solving lots of linear systems

• Application Characteristics:

− A lot of linear systems

− Linear systems of varying size

− Up to 100,000 rows and a few hundred columns

[1] L. Bottolo and S. Richardson, “Evolutionary Stochastic Search for Bayesian Model Exploration”, Bayesian Analysis, vol. 5, no. 3, pp. 583–618, 09 2010.

Genes Search Space

5

Page 7: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Our focus

• Towards heterogeneity for high performance across matrix sizes

• Novel FPGA solver for tall-skinny linear systems

• Modelling framework

– Performance and resource estimation (compile and runtime)

– Optimal hardware configuration (compile-time)

• Up to 18x speed-up in GFLOPS across matrix sizes compared to

existing works

6

Page 8: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Route Map

• Background

• Towards Heterogeneity for Performance

• An Enhanced FPGA Solver

• Evaluation Results

• Conclusions

7

Page 9: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Background – Solving Linear Systems

• The QR factorisation-based methods are dominant

because of their properties

• 𝑨 = 𝑸𝑹 – Orthogonal 𝑸, upper-triangular 𝑹

• 𝑨𝒙 = 𝒃 ⇒ 𝑸𝑹𝒙 = 𝒃

• 𝑸𝑇𝒃 – Matrix-vector product

• 𝒙 = 𝑹−𝟏𝑸𝑇𝒃

Solution using

back-substitution

8

Page 10: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

The Challenge

• Existing solvers

― Target CPUs, GPUs and FPGAs

― Employ different algorithms

― Tailored to specific matrix sizes

• Key Challenge: Sustain high performance across matrix sizes

9

Page 11: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Heterogeneity for Performance – Different Solvers

FPGA GPU CPU

CAQR TSQR Householder QR

Platforms

Algorithms

Problem Space

Anything in between

𝑚 = 𝑛 𝑚 ≫ 𝑛

10

Page 12: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Heterogeneity for Performance – Different Solvers

FPGA GPU CPU

CAQR TSQR Householder QR

Platforms

Algorithms

Problem Space

Anything in between

𝑚 = 𝑛 𝑚 ≫ 𝑛

10

Page 13: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Varying

Matrix Size A

Bank of Solvers

11

Page 14: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

A

Square

System

CPU Solver

MUX Workload Allocation

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Bank of Solvers

11

Page 15: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

A

Bank of Solvers

11

Page 16: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

A

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Bank of Solvers

11

Page 17: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

A

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Bank of Solvers

11

Page 18: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

A

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Bank of Solvers

11

Page 19: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Tall-Skinny

System

Heterogeneity for Performance - A Heterogeneous Solver

FPGA Solver

GPU Solver

CPU Solver

MUX Workload Allocation

A

Tall-Skinny

System

• Have a heterogeneous bank of solvers

• Select the highest performer based on

the matrix size

Ta

rge

t A

pp

lica

tio

n

Bank of Solvers

11

Page 20: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compute Engines

12

Page 21: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compute Engines: FPGA Solver

• Based on existing architecture for tall-skinny QR factorisations [2]

• Functionality Extension

– From QR to Linear Systems workloads

– Exploited an algorithmic property for acceleration

• “Concurrent Solution and Factorisation”

• Any no. of rows

• Up to a max no. of columns

– 5x the max no. of columns of existing FPGA work for the same device

13

[2] A. Rafique et al., FPL 2012.

Page 22: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compute Engines: FPGA Solver

• Splits the matrix into blocks along its rows

• Any no. of rows

• Up to a max no. of columns

– 5x the max no. of columns of existing FPGA work for

the same device

• Configurable parameters

– Size of arithmetic units

– No. of blocks to be active in parallel in the

architecture

14

Page 23: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Concurrent Solution and Factorisation

1. 𝑨 = 𝑸𝑹

2. 𝑨𝒙 = 𝒃 ⇒ 𝑸𝑹𝒙 = 𝒃

3. 𝑸𝑻𝒃 – Matrix-vector product

4. 𝒙 = 𝑹−𝟏𝑸𝑇𝒃

Solution

• Numerically stable algorithms do

not return 𝑸 explicitly

• Additional computations for the

reconstruction of 𝑸 and for 𝑸𝑻𝒃

15

Page 24: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Concurrent Solution and Factorisation

• Compute 𝑸𝑇𝒃 without forming 𝑸 explicitly

A 𝑸

𝑅

𝒃

Previous works This work

A

𝑅

𝑸𝑻𝒃 QR algorithm

Modified

QR algorithm

Q reconstructed from partial results 16

Page 25: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compute Engines: GPU and CPU Solvers

• GPU Solver

– Based on the state-of-the-art work on tall-skinny QR factorisations [3]

– Extended its functionality to Linear Systems by means of the Concurrent Solution and Factorisation

• CPU Solver

– Optimised multithreaded linear algebra library (OpenBLAS)

[3] M. Anderson, C. Ballard, J. Demmel, and K. Keutzer, “Communication-Avoiding QR Decomposition for GPUs”, in Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, May 2011, pp.48-58. 17

Page 26: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Modelling Framework

• Compile-time Framework

– Performance in GFLOPS and resource estimation

models for the FPGA solver

– Optimal hardware configuration of the FPGA

solver

• Runtime Framework

– Workload allocation among the available solvers

18

Page 27: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

19

Page 28: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

FPGA

Resource Info

19

Page 29: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

FPGA

Resource Info

FPGA Performance

Model

19

Page 30: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

FPGA

Resource Info

FPGA Performance

Model

FPGA Solver

Instance

GPU and CPU

Solvers

19

Bank of

Solvers

Page 31: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

Compile-time

GPU and CPU

Profiling

FPGA

Resource Info

FPGA Performance

Model

FPGA Solver

Instance

GPU and CPU

Solvers

19

Bank of

Solvers

Page 32: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

Compile-time

GPU and CPU

Profiling

FPGA

Resource Info

FPGA Performance

Model

New

Workload

FPGA Solver

Instance

GPU and CPU

Solvers

19

Bank of

Solvers

Page 33: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

Compile-time

GPU and CPU

Profiling

FPGA

Resource Info

FPGA Performance

Model Runtime

Workload

Allocation

New

Workload

FPGA Solver

Instance

GPU and CPU

Solvers

19

Bank of

Solvers

Page 34: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

Compile-time

GPU and CPU

Profiling

FPGA

Resource Info

FPGA Performance

Model Runtime

Workload

Allocation

New

Workload

FPGA Solver

Instance

GPU and CPU

Solvers

FPGA Performance Estimate

19

Bank of

Solvers

Page 35: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Compile-time

Modelling

Framework

(Application-Specific)

Typical Matrix Size

Hardware Configuration

Settings

Supplied by

Application

Expert

Heterogeneity for Performance - Proposed Design Flow

Available Solvers

Specifications

Compile-time

GPU and CPU

Profiling

FPGA

Resource Info

FPGA Performance

Model Runtime

Workload

Allocation

New

Workload

FPGA Solver

Instance

GPU and CPU

Solvers

Solution

Solver Selection

FPGA Performance Estimate

19

Bank of

Solvers

Page 36: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Evaluation

20

Page 37: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Experimental Setup

• FPGA

– Xilinx Virtex-6 SX475 at 200 MHz with 2016 DSPs

– Double-precision floating-point

– Up to 275 columns (𝑛 = 275) – 5x the max no. of columns of existing FPGA work

– Any number of rows

– 84.23% BRAM utilization – post place-and-route

– 99.45% DSP utilization – post place-and-route

• GPU

– NVIDIA Tesla K20

– 2496 cores at 706 MHz

• CPU

– Intel i7-4770 at 3.40 GHz, 16 GB RAM, 8 MB cache

– 4 cores, 8 threads

21

Page 38: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Internal Comparisons

No. of Columns (𝑛) = 51

CPU

22

Page 39: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Internal Comparisons

No. of Columns (𝑛) = 51

CPU

4.67 ×

2.74 ×

22

Page 40: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Internal Comparisons

No. of Columns (𝑛) = 51

CPU

Proposed Approach

4.67 ×

2.74 ×

22

Page 41: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Internal Comparisons

CPU

2 ×

No. of Rows (𝑚) = 6400 23

Page 42: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Internal Comparisons

CPU

Proposed Approach

2 ×

No. of Rows (𝑚) = 6400 23

Page 43: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

External Comparisons

Speedup • Up to 25.84x

against

software

• Up to 32.67x

against

CULA

• Up to18.07x

against

FPGA

CPU

CPU

- GPU

No. of Columns (𝑛) = 51 24 [4] A. Rafique et al., FPL 2012.

Page 44: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

External Comparisons

Speedup • Up to 25.84x

against

software

• Up to 32.67x

against

CULA

• Up to18.07x

against

FPGA

CPU - GPU

CPU

No. of Rows (𝑚) = 6400 25 [4] A. Rafique et al., FPL 2012.

Page 45: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Conclusions

• Different solvers perform better on different matrix

sizes

• Using heterogeneous solvers in a complementary way

enable the high-performance solution of complex

problems in fields such as genetic analysis

26

Page 46: Towards Heterogeneous Solvers for Large-Scale Linear Systems · Towards Heterogeneous Solvers for Large-Scale Linear Systems Stylianos I. Venieris, Grigorios Mingas, Christos-Savvas

Thank You & Questions ?

27


Recommended