+ All Categories
Home > Documents > A New Asynchronous Solver for Banded Linear Systems...Michael Jandron – Naval Undersea Warfare...

A New Asynchronous Solver for Banded Linear Systems...Michael Jandron – Naval Undersea Warfare...

Date post: 26-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
19
Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous Solver for Banded Linear Systems 2015 SIAM Conference on Applied Linear Algebra October 29, 2015 Michael Jandron Naval Undersea Warfare Center, Newport, RI Anthony Ruffa, NUWC, Newport, RI Raymond Roberts, NUWC, Newport, RI James Baglama, University of Rhode Island, Kingston, RI 1
Transcript
  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 1

    A New Asynchronous Solver for Banded Linear Systems

    2015 SIAM Conference on Applied Linear Algebra October 29, 2015 Michael Jandron Naval Undersea Warfare Center, Newport, RI Anthony Ruffa, NUWC, Newport, RI Raymond Roberts, NUWC, Newport, RI James Baglama, University of Rhode Island, Kingston, RI

    1

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 2 2

    Looking for new techniques to complement these tried‐and‐true methods 

    • Large sparse problems take a while to solve (days, months, years) – Direct methods still are useful – In FEA, substructuring, Shur Complement, multi-frontal methods common and rely

    on a Gaussian Elimination backbone which is difficult to parallelize – Always looking for ways to increase levels of parallelization and decrease

    communication bound

    Motivation

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 3 3

    • Tridiagonal solver – Limitations and what it’s good for

    • Pentadiagonal solver • General banded solver

    – Theoretical speedup predictions – Development – Numerical implementation – Numerical benchmarks

    • Conclusions and future work

    Outline

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 4

    Method for Tridiagonal Systems

    4

    Augment an unknown to the system [1‐3] 

    Given the following linear system  Split into two tasks 

    Principle of superposition applies 

    Last equation gives: 

    Final vectorized superposition 

    [2] Jandron, M., Ruffa, A., Baglama, J., “An Asynchronous Direct Solver for Banded Linear Systems,” Numerical Algorithms (2015, Submitted) [3] Ruffa, A., Jandron, M., Toni, B., “Parallelized Solution of Banded Linear Systems,” STEAM‐H Springer Series Contribution (2015, Submitted) 

    [1] Ruffa, A., “A Solution Approach for Lower Hessenberg Linear Systems,” ISRN Applied Mathematics (2011) 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 5

    System Details for Tridiagonal Systems

    5

    Undetermined matrix – solution to within constant Choose                       and                     arbitrarily 

    and solve for remaining unknowns 

    1  2 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 6

    Limitations of Modified Forward Sub

    6

    0 20 40 60 80 100-3

    -2

    -1

    0

    1

    2

    3 x 10-14

    Unknown (k)

    Erro

    r (b-

    Ax)

    BackslashMFS

    0 20 40 60 80 100-30

    -25

    -20

    -15

    -10

    -5

    0

    Unknown (k)

    Solu

    tion

    (x)

    BackslashMFS

    0 20 40 60 80 100-1

    -0.5

    0

    0.5

    Unknown (k)

    Erro

    r (b-

    Ax)

    BackslashMFS

    0 20 40 60 80 100-0.5

    -0.4

    -0.3

    -0.2

    -0.1

    0

    Unknown (k)

    Solu

    tion

    (x)

    BackslashMFS

    -1 -0.5 0 0.5 1-1

    -0.5

    0

    0.5

    1

    0 1 2

    -1

    -0.5

    0

    0.5

    1

    Alternate methods? 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 7

    System Details for Tridiagonal Systems

    7

    1  2 

    Option 1: A modified forward substitution scheme 

    Option 2: Using the pseudoinverse General, but can be slower and memory intensive 

    Fast, but can be unreliable in some cases without a form of pivoting or precision control 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 8

    Method for Pentadiagonal Systems

    8 How does it work for general banded systems? 

    Add a two variables 

    Given the following linear system  Split into three tasks 

    Principle of superposition: 

    Last two gives a constraint linear system: 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 9

    Extension to Banded Systems

    9

    Independent linear systems 

    Partial solution vectors 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 10

    Extension to Banded Systems

    10

    Constraint Matrix 

    Superposition 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 11

    Numerical Implementation

    11 Even the constraint matrix can be split up if desired 

    Request solution Broadcast          to each available core 

    Begin asynchronous forward substitution 

    as it arrives 

    Send extra variables back as they are formed 

    Once all extra variables come back, tackle constraint matrix using any dense solver 

    Master thread 

    Level 1 superposition to get final solution 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 12

    Banded Systems Expected Speedups

    Speedup Number of superdiagonals Number of subdiagonals Number of unknowns 

    Banded Gaussian Elimination    

    Forward / backward Substitution 

    Dense Constraint Matrix Solve 

    Superposition 

       ‐core BMFS 

    Seq. BMFS 

    Same cost 

    Speedup 

    Seq. LU 

    Pentadiagonal should be ~ 8x faster than sequential LU Tridiagonal should be ~ 2x faster than sequential LU 

    Heptadiagonal should be ~ 18x faster than sequential LU 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 13

    0 2000 4000 6000 8000 100000

    100

    200

    300

    400

    q

    X

    1-core8-coreq-core

    Banded Systems Expected Speedups

    Anticipated speedup over sequential LU using a various number of cores 

    1 core is 0.5X 8‐core is 4X 

    n = 1,000,000 

    0 2 4 6 8 10x 105

    0

    2000

    4000

    6000

    8000

    10000

    12000

    q

    X

    1-core8-coreq-core

    n = 1,000,000,000 

    For the same number of      cores LU (e.g. multi‐frontal) must scale to these levels in order to match speed 

     

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 14

    Banded Systems Expected Speedups

    We know optimal locations for max speedup over sequential LU 

    For the same number of      cores LU (e.g. multi‐frontal) must scale to these levels in order to match speed 

     

    Speedup Number of superdiagonals Number of subdiagonals Number of unknowns 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 15

    Numerical Benchmarks

    15

    Tests              dependence without exponential growth 

    For simplicity let’s just consider symmetric cases  

    Implementation FORTRAN 90 OPENMP with 8‐cores PARDISO 5.0.0 [1‐3] Solver using 8‐cores  

    [1] M. Luisier, O. Schenk et.al.,Fast Methods for Computing Selected Elements of the Green's Function in Massively Parallel Nanoelectronic Device Simulations, Euro‐Par 2013, LNCS 8097, F. Wolf, B. Mohr, and D. an Ney (Eds.), Springer‐Verlag Berlin Heidelberg, pp. 533–544, 2013, [2] O. Schenk, M. Bollhoefer, and R. Roemer, On large‐scale diagonalization techniques for the Anderson model of localization. Featured SIGEST paper in the SIAM Review selected "on the basis of its exceptional interest to the entire SIAM community". SIAM Review 50 (2008), pp. 91‐112. [3] O. Schenk, A. Waechter, and M. Hagemann, Matching‐based Preprocessing Algorithms to the Solution of Saddle‐Point Problems in Large‐Scale Nonconvex Interior‐Point Optimization. Journal of Computational Optimization and Applications, pp. 321‐341, Volume 36, Numbers 2‐3 / April, 2007. 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 16

    Numerical Results with 8-cores

    16

    Wall time was less than PARDISO in certain cases without even scaling 

    Speedup Number of superdiagonals Number of unknowns 

    FORTRAN OpenMP Speedup results PARDISO 8 cores BMFS 8 cores 

    PARDISO BMFS 

    BMFS – 8‐core BMFS – 8‐core qxq solve BMFS – q‐core scaled PARDISO – 8‐core 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 17

    Numerical Results with 8-cores

    17

    Increased error likely due to round off and errors inherent in constraint matrix solve 

    Speedup Number of superdiagonals Number of unknowns 

    FORTRAN OpenMP Speedup results PARDISO 8 cores BMFS 8 cores 

    BMFS – 8‐core BMFS – 8‐core qxq solve BMFS – q‐core scaled PARDISO – 8‐core 

    n = 100,000  n = 500,000 

    n = 1,000,000 

    n = 5,000,000 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 18

    Speedup over PARDISO Solver

    18

    Speedup Number of superdiagonals Number of unknowns 

    From actual wall times 8‐core BMFS vs. 8‐core PARDISO 

    By scaling BMFS to q‐cores (not qxq solve part) vs. 8‐core PARDISO 

  • Michael Jandron – Naval Undersea Warfare Center // Approved for Public Release 19

    Summary

    19

    • Developed an direct solver that can skip the Gaussian elimination process while solving banded linear systems built on a superposition principle [1-3]

    • Fastest for banded systems without exponential growth – Observed speedup over 20x faster than 8 thread PARDISO when using 8 threads for

    small and large

    • Can handle exponential growth (or really any problem thrown at it) by incorporating pseudoinverse calculations but this less attractive

    • Future work involves: – Distributed memory/MPI/GPU computing – Can the pseudoinverse be used efficiently? – Can a form of pivoting be employed?

    • End goal is to develop a competitive direct solver for banded systems with an eye on FEA applications

    [2] Jandron, M., Ruffa, A., Baglama, J., “An Asynchronous Direct Solver for Banded Linear Systems,” Numerical Algorithms (2015, Submitted) [3] Ruffa, A., Jandron, M., Toni, B., “Parallelized Solution of Banded Linear Systems,” STEAM‐H Springer Series Contribution (2015, Submitted) 

    [1] Ruffa, A., “A Solution Approach for Lower Hessenberg Linear Systems,” ISRN Applied Mathematics (2011) 


Recommended