Eigen - Inriased.bordeaux.inria.fr/seminars/eigen_20140304.pdf · – API demos • How can it...

transcript

– algèbre linéaire paresseuse –– lazy linear algebra –

Gaël Guennebaud

[http://eigen.tuxfamily.org]

Outline

• Why shall I used it?– general facts & features

• How to use it?– API demos

• How can it work?– lazy evals

• What's next?

Context

• Matrix computation everywhere– Various applications:• simulations, simulators, video games, audio/image

processing, design, robotic, computer vision,augmented reality, etc.

– Need various tools:• numerical data manipulation, space transformations• inverse problems, PDE, spectral analysis

– Need performance:• on standard PC, smartphone, embedded systems, etc.• real-time performance

MatLab

+ friendly API+ large set of features- math only- extremely slow for small objects

=> Prototyping

MatLab

+ friendly API+ large set of features- math only- extremely slow for small objects

=> Prototyping

Matrix computation?

HPC libs

+ highly optimized - 1 feature = 1 lib+/- tailored for advanced user / clusters - slow for small objects

=> Advanced usages

HPC libs

+ highly optimized - 1 feature = 1 lib+/- tailored for advanced user / clusters - slow for small objects

=> Advanced usages

MatLabMatLab

Context

HPC libsHPC libsEigen

(start: 2008)

• Pure C++ template library– header only– no binary to compile/install– no configuration step– no dependency (optional only)

#include <Eigen/Eigen>

using namespace Eigen;

int main() { Matrix4f A = Matrix4f::Random(); std::cout << A << std::endl;}

$ g++ -O2 example.cpp -o example

• Pure C++ template library– header only– no binary to compile/install– no configuration step– no dependency (optional only)

• Packaged by all Linux distributions (incl. macport)

• Opensource: MPL2

→ easy to install & distribute

Multi-platforms

• Supported compilers:– GCC, MSVC, Intel ICC, Clang/LLVM

• Supported systems:– Linux, Windows, OSX, IOS, Android, ...

• Supported SIMD vectorization engines:– SSE* (x86), NEON (ARM), Altivec (PowerPC)

– Soon: AVX, future: MIC?

Large feature set

– Core• Matrix and array manipulation (~MatLab, 1D & 2D)• Basic linear algebra (~BLAS)

– incl. triangular & self-adjoint matrix

– LU, Cholesky, QR, SVD, Eigenvalues• Matrix decompositions and linear solvers (~Lapack)

– Geometry (transformations, …)– Sparse• Manipulation• Solvers (LLT, LU, QR & CG, BiCGSTAB, GMRES)

– WIP modules (autodiff, non-linear opt., FFT, Tensors, etc.)

→ “unified API” - “all-in-one”

Optimized for both small and large objects

• Small objects– means fixed sizes:

– malloc-free– meta unrolling– specialized algo

Matrix<float,4,4> Matrix<float,Dynamic,1>

• Large objects– means dynamic sizes

– cache friendly kernels– multi-threading (OpenMP)

– Vectorization (SIMD)– Unified API → write generic code– Mixed fixed/dynamic dimensions

Generic code (1/2)

class Sphere { float[3] center; float radius; /* … */};

• Non-generic code:

Generic code (1/2)

template < int AmbientDim=Eigen::Dynamic>class HyperSphere { Matrix<float ,AmbientDim,1> center; float radius; /* … */};

typedef HyperSphere<2> HyperSphere2;typedef HyperSphere<3> HyperSphere3;typedef HyperSphere<3> Sphere;typedef HyperSphere<> HyperSphereX;

• Write generic code:

– Eigen takes care of the low level optimizations

Generic code (2/2)

template <typename Scalar, int AmbientDim=Eigen::Dynamic>class HyperSphere { Matrix<Scalar,AmbientDim,1> center; Scalar radius; /* … */};

typedef HyperSphere<float, 2> HyperSphere2f;typedef HyperSphere<double,3> HyperSphere3d;

• Write fully generic code:

Custom scalar types

• Can use custom types everywhere– Exact arithmetic (rational numbers)– Multi-precision numbers (e.g., via mpfr++)– Auto-diff scalar types– Interval– (Symbolic?)

• Example:typedef Matrix<mpreal,Dynamic,Dynamic> MatrixMP;MatrixMP A, B, X;// init A and B// solve for A.X=B using LU decompositionX = A.lu().solve(B);

Communication with the world

→ standard matrix representations

• to Eigen

• from Eigen

→ same for sparse matrices

float* raw_data = malloc(...);Map<MatrixXd> M(raw_data, rows, cols);// use M as a MatrixXdM = M.inverse();

MatrixXd M;float* raw_data = M.data();int stride = M.outerStride();raw_data[i+j*stride]

Eigen & BLAS

• Call Eigen's algorithms through a BLAS/Lapack API– Alternative to ATLAS, OpenBlas, Intel MKL• e.g., sparse solvers, Octave, Plasma, etc.

– Run the Lapack test suite on Eigen

Eigen's algorithms

Eigen'sAPI

BLAS/LapackAPI

ExistingOther libs/apps

External backends

• External backends– Fallback to existing BLAS/Lapack, etc.– Unified interface to many sparse solvers:

• UmfPack, Cholmod, PaSTiX, Pardiso, SPQR

Eigen's algorithms

Eigen'sAPI

Other libs(BLAS, solver, ...)

BLAS/LapackAPI

ExistingOther libs

External backends

• External backends– Fallback to existing BLAS/Lapack/etc. (done by Intel)– Unified interface to many sparse solvers:

• UmfPack, Cholmod, PaSTiX, Pardiso, SPQR

Eigen's algorithms

Eigen'sAPI

PaStiX

BLAS/LapackAPI

PaStiXUser code

API demo

Remarks

• No aim to mimic MatLab syntax

• Matrix algebra vs Array worlds

• No magic “\”– Explicitly decomposition choice• LLT, LDLT (pivoting), LU (full/row pivoting),

QR (none,column,full pivoting), SVD (Jacobi/D&C)

MatrixXd A;VectorXd b, x;

PartialPivLU<MatrixXd> lu(A);x = lu.solve(b);b = …;x = lu.solve(b);

Ex. Dense solver: RBF fitting

• RBFs:

– with:

input:• sample positions • with associated values

output:• a smooth scalar field

f :ℝd→ℝf=argmin∑i ( f (pi)−f i )

f (x)=∑ jα jϕ(∥x−q j∥)

ϕ(t )=t3 [ ⋮⋯ ϕ(∥pi−q j∥) ⋯

⋮ ]⋅α=[⋮f i⋮ ]matrixform

Sparse Matrix

• Representation & manipulations– compressed format

• Built-in solvers (Ax=b)– direct: simplicial Cholesky, QR, supernodal LU– iterative: CG, BiCGSTAB (ILUT)

1: typedef SparseMatrix<double,ColMajor> SpMat;2: SpMat A(rows,cols);3: A.setFromTriplets(elements.begin(), elements.end());4: SimplicialCholesky<SpMat,Lower> chol_A(A);5: x = chol_A.solve(b);

Ex. Sparse Solver : FEM

• Poly-harmonic interpolation

• FEM on a triangular mesh:

– some values are fixed:

Δk f=0

Li , j= <▽ϕi ,▽ϕ j>=cotαij+cotβij

Li , i=− ∑v j∈N1(vi)

Li , j

Lk f=0

[L00 L01L10 L11 ]⋅[ ff̄ ]= [00 ] ⇒ L00⋅f=−L01⋅̄f

Sparse Benchmark

• Automatic benchmark of a set of problems/solvers

– precompiled module on Plafrim?

Space Transformations

• Example:Transform<float,3,Affine> T;

T = Translation3f(p) * rot * Translation3f(-p) * Scaling(2);

v' = T * v;

= Quaternionf(...)= AngleAxisf(...)= Matrix3f(...)

Doc & Community

• Documentation tips– Be careful with doxygen class doc– Quick MathLab ↔ Eigen guide– Quick reference pages

• User community– Active project with many users• Website: ~33k unique visitors / month

– Support• Forum, IRC, Mailing-List• Bugzilla (bug, feature request, patches)

When you write your essaysin programming language...

I asked for one copy,not four hundred.

Expression templates

• Example:

• Standard C++ way:

m3 = m1 + m2 + m3;

tmp1 = m1 + m2;tmp2 = tmp1 + m3;m3 = tmp2;

• Example:

• Expression templates:– “+” returns an expression

=> expression tree– e.g.: A+B returns:

– complete example:

Assign<Matrix, Sum< Sum<Matrix,Matrix> , Matrix > >

m3 = m1 + m2 + m3;

Sum<type_of_A, type_of_B> {const type_of_A &A;const type_of_B &B;

• Example:

• Evaluation:– Top-down creation of an evaluator• e.g.:

– Assignment produces:

m3 = m1 + m2 + m3;

for(i=0; i<m3.size(); ++i) m3[i] = m1[i] + m2[i] + m3[i];

Evaluator<Sum<type_of_A,type_of_B> > { Evaluator<type_of_A> evalA(A); Evaluator<type_of_B> evalB(B); Scalar coeff(i,j) { return evalA.coeff(i,j) + evalB.coeff(i,j); }};

Eigen as a code generator

#include<Eigen/Core>using namespace Eigen;

void foo(Matrix2f& u, float a, const Matrix2f& v, float b, const Matrix2f& w){ u = a*v + b*w - u;}

movl 8(%ebp), %edxmovss 20(%ebp), %xmm0movl 24(%ebp), %eaxmovaps %xmm0, %xmm2shufps $0, %xmm2, %xmm2movss 12(%ebp), %xmm0movaps %xmm2, %xmm1mulps (%eax), %xmm1shufps $0, %xmm0, %xmm0movl 16(%ebp), %eaxmulps (%eax), %xmm0addps %xmm1, %xmm0subps (%edx), %xmm0movaps %xmm0, (%edx)

ET: Immediate benefits

• Fused operations– Temporary removal– Reduce memory accesses, cache misses

• Better API, ex:

• Better unrolling

• Better vectorization

x.col(4) = A.lu().solve(B.col(5));

x = b * A.triangularView<Lower>().inverse();

Top-down expression analysis

• Products– detect BLAS-like sub expressions• e.g.:

→ • e.g.:

– reorganize expressions

– triple products:

m4 -= 2 * m2.adjoint() * m3;

gemm<Adj,Nop>(m2, m3, -2, m4);

m4.block(...) += ((2+4i) * m2).adjoint() * m3.block(...).transpose();

m4 -= m1 + m2 * m3; m4 -= m1;m4 -= m2 * m3;

res += m1 * m2 * v; res += m1 * (m2 * v);

Last slide

• Goal: → ideal compromise between:– versatility; ease of use; performance

• Work in Progress:– AVX, CUDA– Matrix functions– nD-Tensors,– SparseMatrixBlock (including direct solvers)– Non-linear solvers: LM, QP– Auto-diff, Polynomials

http://eigen.tuxfamily.org

Eigen - Inriased.bordeaux.inria.fr/seminars/eigen_20140304.pdf · – API demos • How can it...

Documents