Session:Session:
Intel® Performance Libraries: Intel® Performance Libraries: Intel® Math Kernel Library (MKL)Intel® Math Kernel Library (MKL)
INTEL CONFIDENTIAL
Intel® Academic CommunityIntel® Academic Community
A dAgendag
I t l MKL • Intel MKL purpose• Why is Intel MKL faster?Why is Intel MKL faster?• Overview of MKL• Overview of MKL• Intel MKL environment• The Library Sections• The Library Sections• Linking with Intel MKL• Threading in Intel MKL• Threading in Intel MKL
SOFTWARE AND SERVICES2
Copyright © 2014, Intel Corporation. All rights reserved.2
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® M th K l Lib PIntel® Math Kernel Library Purpose® y p
Performance, Performance, Performance!, ,Intel’s engineering scientific and financial math libraryIntel s engineering, scientific, and financial math libraryAddresses:Addresses:
• Solvers (BLAS, LAPACK)• Eigenvector/eigenvalue solvers (BLAS, LAPACK)g / g ( , )• Some quantum chemistry needs (dgemm)Some quantum chemistry needs (dgemm)• PDEs signal processing seismic solid-state physics (FFTs)• PDEs, signal processing, seismic, solid state physics (FFTs)• General scientific financial [vector transcendental functions (VML) and • General scientific, financial [vector transcendental functions (VML) and
vector random number generators (VSL)]vector random number generators (VSL)]S S l (PARDISO DSS & ISS)• Sparse Solvers (PARDISO, DSS & ISS)
Tuned for Intel® processors – current and futurep
SOFTWARE AND SERVICES3
Copyright © 2014, Intel Corporation. All rights reserved.3
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
A li ti A hi h ld MKLApplication Areas which could use MKLpp
Energy - Reservoir simulation Seismics Electromagnetics etcEnergy Reservoir simulation, Seismics, Electromagnetics, etc.Finance - Options pricing, Mortgage pricing, financial portfolio management etc.Op o p g, o gag p g, a a po o o a agManufacturing - CAD, FEA etc.gApplied mathematics
• Linear programming, Quadratic programming, Boundary value problems, Nonlinear parameter estimation Homotopy calculations Curve and surface fitting Numerical integration Fixed point estimation, Homotopy calculations, Curve and surface fitting, Numerical integration, Fixed-point methods, Partial and ordinary differential equations, Statistics, Optimal control and system methods, Partial and ordinary differential equations, Statistics, Optimal control and system theory
Physics & Computer science• Spectroscopy, Fluid dynamics, Optics, Geophysics, seismology, and hydrology,
Electromagnetism Neural network training Computer vision Motion estimation and roboticsElectromagnetism, Neural network training, Computer vision, Motion estimation and roboticsChemistryChemistry
• Physical chemistry, Chemical engineering, Study of transition states, Chemical kinetics, Physical chemistry, Chemical engineering, Study of transition states, Chemical kinetics, Molecular modeling, Crystallography, Mass transfer, Speciation
Engineering• Structural engineering, Transportation analysis, Energy distribution networks, Radar
applications Modeling and mechanical design Circuit designapplications, Modeling and mechanical design, Circuit designBiology and medicineBiology and medicine
• Magnetic resonance applications, Rheology, Pharmacokinetics, Computer-aided diagnostics, ag o a app a o , o ogy, a a o , o pu a d d d ag o ,Optical tomography
Economics and sociologyR d ili d l G h d i i l i i Fi i l f li • Random utility models, Game theory and international negotiations, Financial portfolio managementmanagement
SOFTWARE AND SERVICES4
Copyright © 2014, Intel Corporation. All rights reserved.4
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Wh i I t l MKL f t ?Why is Intel MKL faster?y
Optimization done for maximum speed.Opt at o do e o a u speedResource limited optimization exhaust one or more resource of system:Resource limited optimization – exhaust one or more resource of system:
– CPU: Register use, FP units.g ,– Cache: Keep data in cache as long as possible; deal with – Cache: Keep data in cache as long as possible; deal with
cache interleavingcache interleaving.– TLBs: Maximally use data on each page.y p g
Memory bandwidth: Minimally access memory– Memory bandwidth: Minimally access memory.– Computer: Use all the processor cores available using p p g
threading. threading. System: Use all the nodes available– System: Use all the nodes available.
SOFTWARE AND SERVICES5
Copyright © 2014, Intel Corporation. All rights reserved.5
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
BLAS Performance – multiple threadsBLAS Performance – multiple threads
• Performance (DGEMM function)• Performance (DGEMM function)E ll t li lti• Excellent scaling on multiprocessors
• Intel MKL performs far better than ATLAS* on multi-corep
100% better than Atlas100% better than Atlas200% better for larger matricesg
Intel may make changes to specification, product descriptions, and plans at any time, without notice. a d p a s a a y e, ou o cePerformance tests and ratings are measured using specific
computer systems and/or components and reflect the p y papproximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
f f t t th id iperformance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products referenceon the performance of Intel products, reference www.intel.com/software/products or call (U.S.) 1-800-628-8686 or 1 916 356 3104or 1-916-356-3104
SOFTWARE AND SERVICES6
Copyright © 2014, Intel Corporation. All rights reserved.6
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y
BLASBLAS– Basic vector-vector/matrix-vector/matrix-matrix computation – Basic vector-vector/matrix-vector/matrix-matrix computation
routinesroutines.Sparse BLASSparse BLAS
– BLAS for sparse vectors/matricesLAPACK (Linear algebra package)LAPACK (Linear algebra package)
– Solvers and eigensolvers Many hundreds of routines total!Solvers and eigensolvers. Many hundreds of routines total!ScaLAPACKScaLAPACK
– computational, driver and auxiliary routines for distributed-memory harchitectures
DFTs (General FFTs)( )– Mixed radix, multi-dimensional transformsMixed radix, multi dimensional transforms– Multi threadedMulti threaded
Cluster DFTCluster DFT– For SMP systems
SOFTWARE AND SERVICES7
Copyright © 2014, Intel Corporation. All rights reserved.7
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y
Sparse Solvers (PARDISO, DSS and and ISS)p ( , )– For symmetric, structurally symmetric or non-symmetric, positive definite, For symmetric, structurally symmetric or non symmetric, positive definite,
indefinite or Hermitian sparse linear system of equationsindefinite or Hermitian sparse linear system of equations– OOC version for huge problem sizesOOC version for huge problem sizes
VML (Vector Math Library)VML (Vector Math Library)S t f t i d t d t l f ti t f lib f ti b t f t– Set of vectorized transcendental functions, most of libm functions, but faster
VSL (Vector Statistical Library)– Set of vectorized random number generatorsg
PDEs (Partial Differential Equations)PDEs (Partial Differential Equations)– Trigonometric transform and Poisson solversTrigonometric transform and Poisson solvers.
Optimization SolversOptimization SolversSolvers for nonlinear least square problems with/without boundary condition– Solvers for nonlinear least square problems with/without boundary condition
GMP– arbitrary precision arithmetic operations on integer numbers
Support FunctionsSupport Functions
SOFTWARE AND SERVICES8
Copyright © 2014, Intel Corporation. All rights reserved.8
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y
Data types supported:Data types supported:Single precision Real and Complex– Single precision Real and ComplexD bl i i R l d C l– Double precision Real and Complex
ExamplesC/C++, Fortran and now a few Java examples C/C++, Fortran and now a few Java examples
Well documentedWell documented
SOFTWARE AND SERVICES9
Copyright © 2014, Intel Corporation. All rights reserved.9
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL E i tIntel® MKL EnvironmentWi d * Li * M OS*Windows* Linux* Mac OS*
Compiler Intel CVF Microsoft Intel Gnu Intel GnuCompiler Intel, CVF, Microsoft Intel, Gnu Intel, Gnu
Libraries .lib, .dll .a, .so .a, .dylib
• 32bit and 64 bit libraries to support 32-bit and 64-bit Intel® processors • Static and Runtime dynamic libraries
Domain Fortran 77 Fortran 95/99 C/C++
BLAS * * Via CBLAS
Sparse BLAS Level 1 * *Sparse BLAS Level 1
Sparse BLAS level 1&2 * * *Sparse BLAS level 1&2
LAPACK * *LAPACK * *
ScaLAPACK *ScaLAPACK *
PARDISO * *PARDISO * *
DSS & ISS * * *DSS & ISS * * *
/ * *VML/VSL * *
FFT/Cluster FFT * *
PDEs * *
Optimization (TR) Solvers * * *
SOFTWARE AND SERVICES10
Copyright © 2014, Intel Corporation. All rights reserved.10
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic CommunityIntel® MKL: BLASIntel® MKL: BLAS
BLAS (Basic Linear Algebra Subroutines)( g )• Level 1 BLASLevel 1 BLAS
– vector-vector operationsvector vector operations– dot products swap min max scaling rotation etcdot products, swap, min, max, scaling, rotation etc.
• Level 2 BLAS • Level 2 BLAS – matrix-vector operations– matrix-vector operations
matrix vector products Rank 1 2 updates Triangular solvers etc– matrix-vector products, Rank 1, 2 updates, Triangular solvers etc.• Level 3 BLAS• Level 3 BLAS
t i t i ti– matrix-matrix operationsM t i t i d t R k k 2k d t T i l l t– Matrix-matrix products, Rank-k, 2k updates, Triangular solvers etc.
S BLAS • Sparse BLAS – BLAS Level 1, 2 & 3 for sparse vectors and matrices
M t i St S hMatrix Storage Schemes:• BLAS: Full Packed and Banded Storage• BLAS: Full, Packed and Banded Storage• Sparse BLAS: CSR and its variations CSC coordinate diagonal skyline storage • Sparse BLAS: CSR and its variations, CSC, coordinate, diagonal, skyline storage,
formats, BSR and its variations.,
SOFTWARE AND SERVICES11
Copyright © 2014, Intel Corporation. All rights reserved.11
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Matrix MultiplicationMatrix Multiplication
Roll Your OwnRoll Your Own
f (i 0 i i ) {for (i = 0; i < N; i++) {for (j=0; j<N; j++) {for (j 0; j<N; j++) {for (k=0; k<N; k++) {for (k=0; k<N; k++) {
c[N*i+j] += a[N*i+k] * b[N*k+j];[ j] [ ] [ j];}}
}}}}
SOFTWARE AND SERVICES12
Copyright © 2014, Intel Corporation. All rights reserved.12
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Matrix MultiplicationMatrix Multiplication
ddot from BLAS Level 1ddot from BLAS Level 1
f (i 0 i i ) {for (i = 0; i < N; i++) {for (j=0; j<N; j++) {for (j 0; j<N; j++) {c[N*i+j] =cblas ddot(N &a[N*i] incx &b[j] incy);c[N*i+j] =cblas_ddot(N,&a[N*i],incx,&b[j],incy);
}}}}
SOFTWARE AND SERVICES13
Copyright © 2014, Intel Corporation. All rights reserved.13
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Matrix MultiplicationMatrix Multiplication
dgemv from BLAS Level 2dgemv from BLAS Level 2
f (i 0 i i ) {for (i = 0; i < N; i++) {cblas dgemv(CblasRowMajor, CblasNoTrans, N, N,cblas_dgemv(CblasRowMajor, CblasNoTrans, N, N,
alpha a N &b[i] N beta &c[i] N);alpha, a, N, &b[i],N,beta,&c[i],N);}}
SOFTWARE AND SERVICES14
Copyright © 2014, Intel Corporation. All rights reserved.14
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Matrix MultiplicationMatrix Multiplication
dgemm from BLAS Level 3dgemm from BLAS Level 3
bl d ( bl j blcblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, alpha, b, N, a,CblasNoTrans, N, N, N, alpha, b, N, a,
N beta c N);N, beta, c, N);
SOFTWARE AND SERVICES15
Copyright © 2014, Intel Corporation. All rights reserved.15
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
A ti it 1 M t i lti li tiActivity 1: Matrix multiplicationy p
Compare the performance of matrix multiply as implemented by C source code, DDOT, DGEMG and DGEMM.sou ce code, O , G G a d G
Exercise control of the threading capabilities in MKL/BLASExercise control of the threading capabilities in MKL/BLAS.
SOFTWARE AND SERVICES16
Copyright © 2014, Intel Corporation. All rights reserved.16
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL LAPACKIntel® MKL: LAPACK®
Routines for:Routines for:Solving systems of linear equations factoring and inverting – Solving systems of linear equations, factoring and inverting matrices and estimating condition numbersmatrices, and estimating condition numbers.S l i l t i l d i l l bl d – Solving least squares, eigenvalue and singular value problems, and Sylvester's equationsSylvester's equations.A ili d ili k– Auxiliary and utility tasks.
Driver Routines: To solve a particular problem, call two or more computational p p , proutines or call a driver routine that combines several tasks in one call
Most important LAPACK optimizations:ost po ta t C opt at o s• Recursive factorization• Recursive factorization
Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)– Reduces scalar time (Amdahl s law: t = tscalar + tparallel/p)E t d bl ki f th i t th d– Extends blocking further into the code
No runtime library support requiredy pp q
SOFTWARE AND SERVICES17
Copyright © 2014, Intel Corporation. All rights reserved.17
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL S LAPACKIntel® MKL: ScaLAPACK®
• LAPACK for distributed memory architectures• LAPACK for distributed memory architecturesU i MPI BLACS d t f BLAS• Using MPI, BLACS and a set of BLAS
• Uses 2D block cyclic data distribution for dense matrix computations y pwhich helpsp
– Better work balance between available processorsBetter work balance between available processorsU BLAS l l 3 f ti i d l l t ti– Use BLAS level 3 for optimized local computations
SOFTWARE AND SERVICES18
Copyright © 2014, Intel Corporation. All rights reserved.18
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL BLACSIntel® MKL: BLACS®
The BLACS routines implemented in Intel MKL are of four categories:The BLACS routines implemented in Intel MKL are of four categories:b• Combines
• Point to Point Communication• Point to Point Communication• Broadcast Broadcast S t• Support.
SOFTWARE AND SERVICES19
Copyright © 2014, Intel Corporation. All rights reserved.19
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL S S lIntel® MKL: Sparse Solvers® p
User callable Linear Sparse SolversPARDISO – Parallel Direct Sparse SolverPARDISO Parallel Direct Sparse Solver
For SMP systems– For SMP systems– High performance, robust and memory efficient– Based on Level-3 BLAS update and pipelining parallelismp p p g p– OOC version for huge problem sizes– OOC version for huge problem sizes
DSS Di t S S l I t f t PARDISODSS – Direct Sparse Solver Interface to PARDISO– Alternative to PARDISO– Steps: Create ->Define Array Struct->reorder->factor->solve-– Steps: Create ->Define Array Struct->reorder->factor->solve-
>Delete>DeletelISS – Iterative Sparse Solver
– RCI basedRCI basedFor symmetric positive definite and for non symmetric indefinite – For symmetric positive definite and for non-symmetric indefinite systemssystems
SOFTWARE AND SERVICES20
Copyright © 2014, Intel Corporation. All rights reserved.20
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL V t M th Lib (VML)Intel® MKL: Vector Math Library (VML)® y ( )
Highly optimized implementations of computationally expensive core g y p p p y pmathematical functions (power, trigonometric, exponential, mathematical functions (power, trigonometric, exponential, hyperbolic etc ) hyperbolic etc.)
Operates on a vector unlike libm.
Multiple accuracy modesp y• High accuracy (HA) ~53 bits accurate g y ( )• Lower accuracy (LA), faster ~51 bits accuratey ( ),• Enhanced Performance (EP) ~26 bits accurate( )
Special value handling √(-a), sin(0), and so onCan improve performance of non-linear programming and integrals computations applications.
SOFTWARE AND SERVICES21
Copyright © 2014, Intel Corporation. All rights reserved.21
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL V t St ti ti l Lib (VSL)Intel® MKL: Vector Statistical Library (VSL)® y ( )
Functions for:Functions for: Basic RNGs
• Generating vectors of pseudorandom and Pseudo RNGs Quasi RNGs
quasi-random numbers MC Sob• Convolution & Correlation
MCG31,
Sobol-quasi, Convolution & Correlation
Parallel computation support some functions,
GFSR250, q ,
NiederreiterParallel computation support – some functions MRG32,
MCG59 WH quasi
User can supply own BRNG or transformations MCG59, WH, MT19937 pp y MT19937, MT2203
Performance Comparison of Random Number Generator
Distribution GeneratorsIntel Xeon 5300 Running Time (s)
Speedup
Continuous DiscreteTime (s)
Standard rand() 40 52 1 Continuous Discrete
Uniform Gaussian (two Uniform UniformBits
Standard rand() 40.52 1
Uniform, Gaussian (two methods), Exponential,
Uniform, UniformBits, Bernoulli, Geometric,
Intel MKL VSL RNG 6.88 5.89), p ,
Laplace, Weibull, Cauchy, Rayleigh Lognormal
, ,Binomial, Hypergeometric, Poission PoissonV
MKL + OpenMP* version (8threads) 0.92 44.04
Rayleigh, Lognormal, Gumbel, Gamma, Beta
Poission, PoissonV, NegBinomial
C fi ti I f, , g
Configuration Info:• Quad-Core Intel® Xeon® processor 5300 seriesp• 2.4 GHz, 2x8MB L2 cache, 4 GB memory• Windows Server* 2003 Enterprise x64 Edition• Windows Server 2003 Enterprise x64 Edition • Test run on a vector of 1000 elements
I t l MKL 10 0 d I t l® C C il 10 1Excellent Multi-core Scaling
• Intel MKL 10.0 and Intel® C++ Compiler 10.1g
SOFTWARE AND SERVICES22
Copyright © 2014, Intel Corporation. All rights reserved.22
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Using VSLUsing VSL
Basically a 3-step ProcessBasically a 3 step Process1 C eate a st eam pointe1. Create a stream pointer.
VSLStreamStatePtr stream;
2 Create a stream2. Create a stream.l ( 31 d)vslNewStream(&stream,VSL_BRNG_MC_G31,seed);
3 Generate a set of RNGs3. Generate a set of RNGs.lR U if (0 & t i t t t d)vslRngUniform(0,&stream,size,out,start,end);
4. Delete a stream (optional).4. Delete a stream (optional).vslDeleteStream(&stream);vslDeleteStream(&stream);
SOFTWARE AND SERVICES23
Copyright © 2014, Intel Corporation. All rights reserved.23
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
A ti it 2 C l l ti Pi i M t Activity 2: Calculating Pi using a Monte y g gCarlo methodCarlo method
Compare the performance of C source code (RAND function) and VSLCompare the performance of C source code (RAND function) and VSL.Exercise control of the threading capabilities in MKL/VSL.g p
SOFTWARE AND SERVICES24
Copyright © 2014, Intel Corporation. All rights reserved.24
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Intel® MKL: Fast Fourier Transform (FFT)Intel® MKL: Fast Fourier Transform (FFT)
• 1 2 & 3 dimensional• 1, 2 & 3 dimensional• Multithreaded• Mixed radix• Mixed radix
U ifi d li f i• User-specified scaling, transform sign• Multiple one-dimensional transforms on single call• Multiple one dimensional transforms on single call
St id• Strides• Supports FFTW interface through wrappersSupports FFTW interface through wrappers
SOFTWARE AND SERVICES25
Copyright © 2014, Intel Corporation. All rights reserved.25
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Using the Intel® MKL DFTsUsing the Intel® MKL DFTs
Basically a 3-step ProcessBasically a 3 step Process1 C t d i t1. Create a descriptor.
Status = DftiCreateDescriptor(MDH, …)Status DftiCreateDescriptor(MDH, …)
2 Commit the descriptor (instantiates it)2. Commit the descriptor (instantiates it).Status = DftiCommitDescriptor(MDH)
3 Perform the transform3. Perform the transform.St t DftiC t F d(MDH X)Status = DftiComputeForward(MDH, X)
Optionally free the descriptor.p y p
MDH: MyDescriptorHandleMDH: MyDescriptorHandle
SOFTWARE AND SERVICES26
Copyright © 2014, Intel Corporation. All rights reserved.26
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Intel® MKL: Cluster FFTIntel® MKL: Cluster FFT
• FFT for SMP systems/clustersy /• Works with MPI using BLACSWorks with MPI using BLACS• 1 2 3 and multidimensional • 1, 2, 3 and multidimensional • Require basic MPI programming skills• Require basic MPI programming skills
Same interface as the DFT from standard MKL• Same interface as the DFT from standard MKL
Intel may make changes to specification, product descriptions, and plans at any time, without notice. Performance tests and ratings are measured using specific computer systems and/or components and p p y preflect the approximate performance of Intel products as measured by those tests. Any difference in systemas measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance Buyers should consultaffect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they areperformance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intelperformance tests and on the performance of Intel products, reference www.intel.com/software/products
ll (U S ) 1 800 628 8686 1 916 356 3104or call (U.S.) 1-800-628-8686 or 1-916-356-3104
SOFTWARE AND SERVICES27
Copyright © 2014, Intel Corporation. All rights reserved.27
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Intel® MKL: Partial Differential EquationsIntel® MKL: Partial Differential Equations
• Poisson LibraryPoisson Libraryfor fast solving of simple Helmholtz Poisson and – for fast solving of simple Helmholtz, Poisson, and L l blLaplace problems
• Trigonometric Transform interface routines Trigonometric Transform interface routines lInitialize
Change Routine parameters manuallyChange Routine parameters manually
? i i i fC it
?_init_trig_transformCommit?_commit_trig_transform
?_forward_trig_transform
F d/B k d T f?_backward_trig_transform
Forward/Backward Transformfree_trig_transform_ _
freefree
SOFTWARE AND SERVICES28
Copyright © 2014, Intel Corporation. All rights reserved.28
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Intel® MKL: Optimization SolversIntel® MKL: Optimization Solvers
Optimization solver routines for:Optimization solver routines for:– solving nonlinear least squares problems without constraintsg q p– solving nonlinear least squares problems with boundary constraintssolving nonlinear least squares problems with boundary constraints
computing the Jacobi matrix by central differences for solving – computing the Jacobi matrix by central differences for solving li l t blnonlinear least squares problem
Based on Trust Region (TR) MethodsBased on Trust Region (TR) Methods.– TR strength - global and super-linear convergence which
differentiates them from the first order methods and unmodified Newton methods.
SOFTWARE AND SERVICES29
Copyright © 2014, Intel Corporation. All rights reserved.29
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Intel® MKL: GMPIntel® MKL: GMP
• Arbitrary precision arithmetic routines on integers• Arbitrary precision arithmetic routines on integers• Interface fully match with Gnu Multi Precision liby• If your application uses GMP functions link with the MKL and • If your application uses GMP functions, link with the MKL and
libm libraries libm libraries. – For example on IA32: $CC prog.c -L$MKL_LIB_PATH —lmkl_intelp
—lmkl core —liomp5 -lpthread -lmlmkl_core liomp5 lpthread lm
Optimized for Intel processors• Optimized for Intel processors
SOFTWARE AND SERVICES30
Copyright © 2014, Intel Corporation. All rights reserved.30
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL S t F tiIntel® MKL: Support Functions® pp
I t l® MKL t f ti d tIntel® MKL support functions are used to:– retrieve information about the current Intel MKL version– additionally control the number of threadsadditionally control the number of threads
h dl – handle errors– test characters and character strings for equality– measure user time for a process and elapsed CPU timemeasure user time for a process and elapsed CPU time
set and measure CPU frequency– set and measure CPU frequencyf ll d b I l MKL f– free memory allocated by Intel MKL memory management software
SOFTWARE AND SERVICES31
Copyright © 2014, Intel Corporation. All rights reserved.31
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Li ki ith I t l® MKLLinking with Intel® MKLg ®
• Static Linking• Static Linking• Dynamic linking• Custom Dynamic Linkingu o y a g
Quick Comparison of Intel MKL Linkage Models
Feature Dynamic Linkage Static Linkage Custom Dynamic Linkage
Processor Updates Automatic Automatic Recompile and redistribute
Optimization All Processors All Processors All Processors
Build Link to import Link to static Build separate import libraries, plibraries libraries
p p ,which are created automatically
Calling Regular Names Regular Names Modified Names
Total Binary Size Large Small SmallTotal Binary Size Large Small Small
Executable Size Smallest Small SmallestExecutable Size Smallest Small Smallest
Multi-threaded/ thread safe
Yes Yes Yesthread safe
SOFTWARE AND SERVICES32
Copyright © 2014, Intel Corporation. All rights reserved.32
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Li ki ith I t l® MKL tdLinking with Intel® MKL contd..g ®
L d d l h f b tt t lLayered model approach for better control• Interface Layer
Compiler: Intel / GNU Interfaces LP64 / ILP64 Interfaces
• Threading Layer Threading Threaded / alternate OpenMP
gComputationp
Sequential Computationq• Computational Layer Run-timep y• Run-time Layer
Run timey
Choose the libs from each layer for linking.
E 1 St ti li ki i I t l® F t C il BLAS I t l® 64 LiEx 1: Static linking using Intel® Fortran Compiler, BLAS, Intel® 64 processor on Linux
$ifort myprog f libmkl intel lp64 a libmkl intel thread a libmkl core a libiomp5 so$ifort myprog.f libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a libiomp5.so
Ex 2: Dynamic linking with Intel® C++ compiler on Windowsy g ® p
c:\>icl mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.dll
Note: Strongly recommended to link Run-time layer library dynamicallySOFTWARE AND SERVICES
g y y y y y33
Copyright © 2014, Intel Corporation. All rights reserved.33
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
I t l® MKL Th diIntel® MKL Threading® g
There are numerous opportunities for threading:g• Level 3 BLAS ( O(n3) )e e 3 S ( O( 3) )• LAPACK* ( O(n3) )• LAPACK ( O(n3) )• FFTs ( O(n log(n) )• FFTs ( O(n log(n) )• VML VSL ? depends on processor • VML, VSL ? depends on processor
and functionand function
Not threaded for some routines due to: – Limited resource is memory • Threaded using OpenMP*bandwidth.
Th di l l 1 d l l 2 BLAS
Threaded using OpenMP– With support for GCC* and
– Threading level 1 and level 2 BLAS tl i ff ti ( O( ) )
With support for GCC and Microsoft* OpenMP*
are mostly ineffective ( O(n) )
• ScaLAPACK and Cluster FFT are SMP ParallelAll I l® MKL i h d f• All Intel® MKL is thread-safe
SOFTWARE AND SERVICES34
Copyright © 2014, Intel Corporation. All rights reserved.34
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
Threading Control in Intel® MKLThreading Control in Intel® MKL
Set OpenMP or Intel MKL environment variable:OMP_NUM_THREADS
MKL_NUM_THREADS
MKL_DOMAIN_NUM_THREADS
Call OpenMP or Intel MKL usingp gomp_set_num_threads()p_ _ _
mkl_set_num_threads()_ _ _ ()
mkl domain set num threads()_ _ _ _ ()
MKL DYNAMIC/mkl set dynamic(): Intel® MKL decides the number of threadsMKL_DYNAMIC/mkl_set_dynamic(): Intel® MKL decides the number of threads.Example: You could configure Intel MKL to run 4 threads for BLAS, but sequentially in all other parts of Example: You could configure Intel MKL to run 4 threads for BLAS, but sequentially in all other parts of
the library• Environment variable
set MKL_DOMAIN_NUM_THREADS=“MKL_ALL=1, MKL_BLAS=4”F ti ll• Function calls
mkl domain set num threads( 1 MKL ALL);mkl_domain_set_num_threads( 1, MKL_ALL);mkl domain set num threads( 4, MKL BLAS);mkl_domain_set_num_threads( 4, MKL_BLAS);
SOFTWARE AND SERVICES35
Copyright © 2014, Intel Corporation. All rights reserved.35
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
P f Lib i I t l® MKLPerformance Libraries: Intel® MKL®What’s Been CoveredWhat s Been Covered
• Intel® Math Kernel Library is a broad scientific/engineering math library.• It is optimized for Intel® processors.p ® p• It is threaded for effective use on multi-core and SMP machines• It is threaded for effective use on multi core and SMP machines.
SOFTWARE AND SERVICES36
Copyright © 2014, Intel Corporation. All rights reserved.36
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Intel® Academic CommunityIntel® Academic Community
R fReferences
Intel® MKL product Informationp odu o a owww intel com/software/products/mkl– www.intel.com/software/products/mkl
Technical Issues/Questions/Feedback/Q /– http://premier intel com/– http://premier.intel.com/
lf h lSelf-help– http://www intel com/software/products/mkl http://www.intel.com/software/products/mkl
( Click “Support Resources” tab)( Click Support Resources tab)User Discussion Forum
– http://softwareforums intel com/ids/board?board id=MKL http://softwareforums.intel.com/ids/board?board.id=MKL Wh h f l ?What are the new software tools?
– http://whatif intel comhttp://whatif.intel.com
SOFTWARE AND SERVICES37
Copyright © 2014, Intel Corporation. All rights reserved.37
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.