Post on 15-Aug-2020
transcript
Eigen
– algèbre linéaire paresseuse –– lazy linear algebra –
Gaël Guennebaud
[http://eigen.tuxfamily.org]
2
Outline
• Why shall I used it?– general facts & features
• How to use it?– API demos
• How can it work?– lazy evals
• What's next?
3
Context
• Matrix computation everywhere– Various applications:• simulations, simulators, video games, audio/image
processing, design, robotic, computer vision,augmented reality, etc.
– Need various tools:• numerical data manipulation, space transformations• inverse problems, PDE, spectral analysis
– Need performance:• on standard PC, smartphone, embedded systems, etc.• real-time performance
4
MatLab
+ friendly API+ large set of features- math only- extremely slow for small objects
=> Prototyping
MatLab
+ friendly API+ large set of features- math only- extremely slow for small objects
=> Prototyping
Matrix computation?
HPC libs
+ highly optimized - 1 feature = 1 lib+/- tailored for advanced user / clusters - slow for small objects
=> Advanced usages
HPC libs
+ highly optimized - 1 feature = 1 lib+/- tailored for advanced user / clusters - slow for small objects
=> Advanced usages
?
5
MatLabMatLab
Context
HPC libsHPC libsEigen
(start: 2008)
Eigen
(start: 2008)
6
Facts
• Pure C++ template library– header only– no binary to compile/install– no configuration step– no dependency (optional only)
#include <Eigen/Eigen>
using namespace Eigen;
int main() { Matrix4f A = Matrix4f::Random(); std::cout << A << std::endl;}
$ g++ -O2 example.cpp -o example
7
Facts
• Pure C++ template library– header only– no binary to compile/install– no configuration step– no dependency (optional only)
• Packaged by all Linux distributions (incl. macport)
• Opensource: MPL2
→ easy to install & distribute
8
Multi-platforms
• Supported compilers:– GCC, MSVC, Intel ICC, Clang/LLVM
• Supported systems:– Linux, Windows, OSX, IOS, Android, ...
• Supported SIMD vectorization engines:– SSE* (x86), NEON (ARM), Altivec (PowerPC)
– Soon: AVX, future: MIC?
9
Large feature set
– Core• Matrix and array manipulation (~MatLab, 1D & 2D)• Basic linear algebra (~BLAS)
– incl. triangular & self-adjoint matrix
– LU, Cholesky, QR, SVD, Eigenvalues• Matrix decompositions and linear solvers (~Lapack)
– Geometry (transformations, …)– Sparse• Manipulation• Solvers (LLT, LU, QR & CG, BiCGSTAB, GMRES)
– WIP modules (autodiff, non-linear opt., FFT, Tensors, etc.)
→ “unified API” - “all-in-one”
10
Optimized for both small and large objects
• Small objects– means fixed sizes:
– malloc-free– meta unrolling– specialized algo
Matrix<float,4,4> Matrix<float,Dynamic,1>
• Large objects– means dynamic sizes
– cache friendly kernels– multi-threading (OpenMP)
– Vectorization (SIMD)– Unified API → write generic code– Mixed fixed/dynamic dimensions
11
Generic code (1/2)
class Sphere { float[3] center; float radius; /* … */};
• Non-generic code:
12
Generic code (1/2)
template < int AmbientDim=Eigen::Dynamic>class HyperSphere { Matrix<float ,AmbientDim,1> center; float radius; /* … */};
typedef HyperSphere<2> HyperSphere2;typedef HyperSphere<3> HyperSphere3;typedef HyperSphere<3> Sphere;typedef HyperSphere<> HyperSphereX;
• Write generic code:
– Eigen takes care of the low level optimizations
13
Generic code (2/2)
template <typename Scalar, int AmbientDim=Eigen::Dynamic>class HyperSphere { Matrix<Scalar,AmbientDim,1> center; Scalar radius; /* … */};
typedef HyperSphere<float, 2> HyperSphere2f;typedef HyperSphere<double,3> HyperSphere3d;
• Write fully generic code:
14
Custom scalar types
• Can use custom types everywhere– Exact arithmetic (rational numbers)– Multi-precision numbers (e.g., via mpfr++)– Auto-diff scalar types– Interval– (Symbolic?)
• Example:typedef Matrix<mpreal,Dynamic,Dynamic> MatrixMP;MatrixMP A, B, X;// init A and B// solve for A.X=B using LU decompositionX = A.lu().solve(B);
15
Communication with the world
→ standard matrix representations
• to Eigen
• from Eigen
→ same for sparse matrices
float* raw_data = malloc(...);Map<MatrixXd> M(raw_data, rows, cols);// use M as a MatrixXdM = M.inverse();
MatrixXd M;float* raw_data = M.data();int stride = M.outerStride();raw_data[i+j*stride]
16
Eigen & BLAS
• Call Eigen's algorithms through a BLAS/Lapack API– Alternative to ATLAS, OpenBlas, Intel MKL• e.g., sparse solvers, Octave, Plasma, etc.
– Run the Lapack test suite on Eigen
Eigen's algorithms
Eigen'sAPI
BLAS/LapackAPI
ExistingOther libs/apps
17
External backends
• External backends– Fallback to existing BLAS/Lapack, etc.– Unified interface to many sparse solvers:
• UmfPack, Cholmod, PaSTiX, Pardiso, SPQR
Eigen's algorithms
Eigen'sAPI
Other libs(BLAS, solver, ...)
BLAS/LapackAPI
ExistingOther libs
18
External backends
• External backends– Fallback to existing BLAS/Lapack/etc. (done by Intel)– Unified interface to many sparse solvers:
• UmfPack, Cholmod, PaSTiX, Pardiso, SPQR
Eigen's algorithms
Eigen'sAPI
PaStiX
BLAS/LapackAPI
PaStiXUser code
20
API demo
21
Remarks
• No aim to mimic MatLab syntax
• Matrix algebra vs Array worlds
• No magic “\”– Explicitly decomposition choice• LLT, LDLT (pivoting), LU (full/row pivoting),
QR (none,column,full pivoting), SVD (Jacobi/D&C)
MatrixXd A;VectorXd b, x;
PartialPivLU<MatrixXd> lu(A);x = lu.solve(b);b = …;x = lu.solve(b);
Ex. Dense solver: RBF fitting
• RBFs:
– with:
input:• sample positions • with associated values
output:• a smooth scalar field
pif i
f :ℝd→ℝf=argmin∑i ( f (pi)−f i )
2
f (x)=∑ jα jϕ(∥x−q j∥)
ϕ(t )=t3 [ ⋮⋯ ϕ(∥pi−q j∥) ⋯
⋮ ]⋅α=[⋮f i⋮ ]matrixform
23
Sparse Matrix
• Representation & manipulations– compressed format
• Built-in solvers (Ax=b)– direct: simplicial Cholesky, QR, supernodal LU– iterative: CG, BiCGSTAB (ILUT)
1: typedef SparseMatrix<double,ColMajor> SpMat;2: SpMat A(rows,cols);3: A.setFromTriplets(elements.begin(), elements.end());4: SimplicialCholesky<SpMat,Lower> chol_A(A);5: x = chol_A.solve(b);
Ex. Sparse Solver : FEM
• Poly-harmonic interpolation
• FEM on a triangular mesh:
– some values are fixed:
Δk f=0
Li , j= <▽ϕi ,▽ϕ j>=cotαij+cotβij
Li , i=− ∑v j∈N1(vi)
Li , j
αij
βij
v i
v j
Lk f=0
[L00 L01L10 L11 ]⋅[ ff̄ ]= [00 ] ⇒ L00⋅f=−L01⋅̄f
v l
vk
Sparse Benchmark
• Automatic benchmark of a set of problems/solvers
– precompiled module on Plafrim?
Space Transformations
• Example:Transform<float,3,Affine> T;
T = Translation3f(p) * rot * Translation3f(-p) * Scaling(2);
v' = T * v;
= Quaternionf(...)= AngleAxisf(...)= Matrix3f(...)
27
Doc & Community
• Documentation tips– Be careful with doxygen class doc– Quick MathLab ↔ Eigen guide– Quick reference pages
• User community– Active project with many users• Website: ~33k unique visitors / month
– Support• Forum, IRC, Mailing-List• Bugzilla (bug, feature request, patches)
When you write your essaysin programming language...
I asked for one copy,not four hundred.
C++
29
Expression templates
• Example:
• Standard C++ way:
m3 = m1 + m2 + m3;
tmp1 = m1 + m2;tmp2 = tmp1 + m3;m3 = tmp2;
30
Expression templates
• Example:
• Expression templates:– “+” returns an expression
=> expression tree– e.g.: A+B returns:
– complete example:
m1 m2
+
+
m3
Assign<Matrix, Sum< Sum<Matrix,Matrix> , Matrix > >
m3 = m1 + m2 + m3;
Sum<type_of_A, type_of_B> {const type_of_A &A;const type_of_B &B;
};
=
m3
31
Expression templates
• Example:
• Evaluation:– Top-down creation of an evaluator• e.g.:
– Assignment produces:
m3 = m1 + m2 + m3;
for(i=0; i<m3.size(); ++i) m3[i] = m1[i] + m2[i] + m3[i];
Evaluator<Sum<type_of_A,type_of_B> > { Evaluator<type_of_A> evalA(A); Evaluator<type_of_B> evalB(B); Scalar coeff(i,j) { return evalA.coeff(i,j) + evalB.coeff(i,j); }};
32
Eigen as a code generator
#include<Eigen/Core>using namespace Eigen;
void foo(Matrix2f& u, float a, const Matrix2f& v, float b, const Matrix2f& w){ u = a*v + b*w - u;}
movl 8(%ebp), %edxmovss 20(%ebp), %xmm0movl 24(%ebp), %eaxmovaps %xmm0, %xmm2shufps $0, %xmm2, %xmm2movss 12(%ebp), %xmm0movaps %xmm2, %xmm1mulps (%eax), %xmm1shufps $0, %xmm0, %xmm0movl 16(%ebp), %eaxmulps (%eax), %xmm0addps %xmm1, %xmm0subps (%edx), %xmm0movaps %xmm0, (%edx)
33
ET: Immediate benefits
• Fused operations– Temporary removal– Reduce memory accesses, cache misses
• Better API, ex:
• Better unrolling
• Better vectorization
x.col(4) = A.lu().solve(B.col(5));
x = b * A.triangularView<Lower>().inverse();
34
Top-down expression analysis
• Products– detect BLAS-like sub expressions• e.g.:
→ • e.g.:
– reorganize expressions
– triple products:
m4 -= 2 * m2.adjoint() * m3;
gemm<Adj,Nop>(m2, m3, -2, m4);
m4.block(...) += ((2+4i) * m2).adjoint() * m3.block(...).transpose();
m4 -= m1 + m2 * m3; m4 -= m1;m4 -= m2 * m3;
res += m1 * m2 * v; res += m1 * (m2 * v);
35
Last slide
• Goal: → ideal compromise between:– versatility; ease of use; performance
• Work in Progress:– AVX, CUDA– Matrix functions– nD-Tensors,– SparseMatrixBlock (including direct solvers)– Non-linear solvers: LM, QP– Auto-diff, Polynomials
http://eigen.tuxfamily.org