+ All Categories
Home > Documents > Session: Intel® Performance Libraries: Intel® Math Kernel ...€¦ · • PDEs signal processing...

Session: Intel® Performance Libraries: Intel® Math Kernel ...€¦ · • PDEs signal processing...

Date post: 06-Jul-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
38
Session: Session: Intel® Performance Libraries: Intel® Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library (MKL) INTEL CONFIDENTIAL
Transcript

Session:Session:

Intel® Performance Libraries: Intel® Performance Libraries: Intel® Math Kernel Library (MKL)Intel® Math Kernel Library (MKL)

INTEL CONFIDENTIAL

Intel® Academic CommunityIntel® Academic Community

A dAgendag

I t l MKL • Intel MKL purpose• Why is Intel MKL faster?Why is Intel MKL faster?• Overview of MKL• Overview of MKL• Intel MKL environment• The Library Sections• The Library Sections• Linking with Intel MKL• Threading in Intel MKL• Threading in Intel MKL

SOFTWARE AND SERVICES2

Copyright © 2014, Intel Corporation. All rights reserved.2

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® M th K l Lib PIntel® Math Kernel Library Purpose® y p

Performance, Performance, Performance!, ,Intel’s engineering scientific and financial math libraryIntel s engineering, scientific, and financial math libraryAddresses:Addresses:

• Solvers (BLAS, LAPACK)• Eigenvector/eigenvalue solvers (BLAS, LAPACK)g / g ( , )• Some quantum chemistry needs (dgemm)Some quantum chemistry needs (dgemm)• PDEs signal processing seismic solid-state physics (FFTs)• PDEs, signal processing, seismic, solid state physics (FFTs)• General scientific financial [vector transcendental functions (VML) and • General scientific, financial [vector transcendental functions (VML) and

vector random number generators (VSL)]vector random number generators (VSL)]S S l (PARDISO DSS & ISS)• Sparse Solvers (PARDISO, DSS & ISS)

Tuned for Intel® processors – current and futurep

SOFTWARE AND SERVICES3

Copyright © 2014, Intel Corporation. All rights reserved.3

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

A li ti A hi h ld MKLApplication Areas which could use MKLpp

Energy - Reservoir simulation Seismics Electromagnetics etcEnergy Reservoir simulation, Seismics, Electromagnetics, etc.Finance - Options pricing, Mortgage pricing, financial portfolio management etc.Op o p g, o gag p g, a a po o o a agManufacturing - CAD, FEA etc.gApplied mathematics

• Linear programming, Quadratic programming, Boundary value problems, Nonlinear parameter estimation Homotopy calculations Curve and surface fitting Numerical integration Fixed point estimation, Homotopy calculations, Curve and surface fitting, Numerical integration, Fixed-point methods, Partial and ordinary differential equations, Statistics, Optimal control and system methods, Partial and ordinary differential equations, Statistics, Optimal control and system theory

Physics & Computer science• Spectroscopy, Fluid dynamics, Optics, Geophysics, seismology, and hydrology,

Electromagnetism Neural network training Computer vision Motion estimation and roboticsElectromagnetism, Neural network training, Computer vision, Motion estimation and roboticsChemistryChemistry

• Physical chemistry, Chemical engineering, Study of transition states, Chemical kinetics, Physical chemistry, Chemical engineering, Study of transition states, Chemical kinetics, Molecular modeling, Crystallography, Mass transfer, Speciation

Engineering• Structural engineering, Transportation analysis, Energy distribution networks, Radar

applications Modeling and mechanical design Circuit designapplications, Modeling and mechanical design, Circuit designBiology and medicineBiology and medicine

• Magnetic resonance applications, Rheology, Pharmacokinetics, Computer-aided diagnostics, ag o a app a o , o ogy, a a o , o pu a d d d ag o ,Optical tomography

Economics and sociologyR d ili d l G h d i i l i i Fi i l f li • Random utility models, Game theory and international negotiations, Financial portfolio managementmanagement

SOFTWARE AND SERVICES4

Copyright © 2014, Intel Corporation. All rights reserved.4

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Wh i I t l MKL f t ?Why is Intel MKL faster?y

Optimization done for maximum speed.Opt at o do e o a u speedResource limited optimization exhaust one or more resource of system:Resource limited optimization – exhaust one or more resource of system:

– CPU: Register use, FP units.g ,– Cache: Keep data in cache as long as possible; deal with – Cache: Keep data in cache as long as possible; deal with

cache interleavingcache interleaving.– TLBs: Maximally use data on each page.y p g

Memory bandwidth: Minimally access memory– Memory bandwidth: Minimally access memory.– Computer: Use all the processor cores available using p p g

threading. threading. System: Use all the nodes available– System: Use all the nodes available.

SOFTWARE AND SERVICES5

Copyright © 2014, Intel Corporation. All rights reserved.5

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

BLAS Performance – multiple threadsBLAS Performance – multiple threads

• Performance (DGEMM function)• Performance (DGEMM function)E ll t li lti• Excellent scaling on multiprocessors

• Intel MKL performs far better than ATLAS* on multi-corep

100% better than Atlas100% better than Atlas200% better for larger matricesg

Intel may make changes to specification, product descriptions, and plans at any time, without notice. a d p a s a a y e, ou o cePerformance tests and ratings are measured using specific

computer systems and/or components and reflect the p y papproximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the

f f t t th id iperformance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products referenceon the performance of Intel products, reference www.intel.com/software/products or call (U.S.) 1-800-628-8686 or 1 916 356 3104or 1-916-356-3104

SOFTWARE AND SERVICES6

Copyright © 2014, Intel Corporation. All rights reserved.6

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y

BLASBLAS– Basic vector-vector/matrix-vector/matrix-matrix computation – Basic vector-vector/matrix-vector/matrix-matrix computation

routinesroutines.Sparse BLASSparse BLAS

– BLAS for sparse vectors/matricesLAPACK (Linear algebra package)LAPACK (Linear algebra package)

– Solvers and eigensolvers Many hundreds of routines total!Solvers and eigensolvers. Many hundreds of routines total!ScaLAPACKScaLAPACK

– computational, driver and auxiliary routines for distributed-memory harchitectures

DFTs (General FFTs)( )– Mixed radix, multi-dimensional transformsMixed radix, multi dimensional transforms– Multi threadedMulti threaded

Cluster DFTCluster DFT– For SMP systems

SOFTWARE AND SERVICES7

Copyright © 2014, Intel Corporation. All rights reserved.7

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y

Sparse Solvers (PARDISO, DSS and and ISS)p ( , )– For symmetric, structurally symmetric or non-symmetric, positive definite, For symmetric, structurally symmetric or non symmetric, positive definite,

indefinite or Hermitian sparse linear system of equationsindefinite or Hermitian sparse linear system of equations– OOC version for huge problem sizesOOC version for huge problem sizes

VML (Vector Math Library)VML (Vector Math Library)S t f t i d t d t l f ti t f lib f ti b t f t– Set of vectorized transcendental functions, most of libm functions, but faster

VSL (Vector Statistical Library)– Set of vectorized random number generatorsg

PDEs (Partial Differential Equations)PDEs (Partial Differential Equations)– Trigonometric transform and Poisson solversTrigonometric transform and Poisson solvers.

Optimization SolversOptimization SolversSolvers for nonlinear least square problems with/without boundary condition– Solvers for nonlinear least square problems with/without boundary condition

GMP– arbitrary precision arithmetic operations on integer numbers

Support FunctionsSupport Functions

SOFTWARE AND SERVICES8

Copyright © 2014, Intel Corporation. All rights reserved.8

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® M th K l Lib C t tIntel® Math Kernel Library Contents® y

Data types supported:Data types supported:Single precision Real and Complex– Single precision Real and ComplexD bl i i R l d C l– Double precision Real and Complex

ExamplesC/C++, Fortran and now a few Java examples C/C++, Fortran and now a few Java examples

Well documentedWell documented

SOFTWARE AND SERVICES9

Copyright © 2014, Intel Corporation. All rights reserved.9

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL E i tIntel® MKL EnvironmentWi d * Li * M OS*Windows* Linux* Mac OS*

Compiler Intel CVF Microsoft Intel Gnu Intel GnuCompiler Intel, CVF, Microsoft Intel, Gnu Intel, Gnu

Libraries .lib, .dll .a, .so .a, .dylib

• 32bit and 64 bit libraries to support 32-bit and 64-bit Intel® processors • Static and Runtime dynamic libraries

Domain Fortran 77 Fortran 95/99 C/C++

BLAS * * Via CBLAS

Sparse BLAS Level 1 * *Sparse BLAS Level 1

Sparse BLAS level 1&2 * * *Sparse BLAS level 1&2

LAPACK * *LAPACK * *

ScaLAPACK *ScaLAPACK *

PARDISO * *PARDISO * *

DSS & ISS * * *DSS & ISS * * *

/ * *VML/VSL * *

FFT/Cluster FFT * *

PDEs * *

Optimization (TR) Solvers * * *

SOFTWARE AND SERVICES10

Copyright © 2014, Intel Corporation. All rights reserved.10

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic CommunityIntel® MKL: BLASIntel® MKL: BLAS

BLAS (Basic Linear Algebra Subroutines)( g )• Level 1 BLASLevel 1 BLAS

– vector-vector operationsvector vector operations– dot products swap min max scaling rotation etcdot products, swap, min, max, scaling, rotation etc.

• Level 2 BLAS • Level 2 BLAS – matrix-vector operations– matrix-vector operations

matrix vector products Rank 1 2 updates Triangular solvers etc– matrix-vector products, Rank 1, 2 updates, Triangular solvers etc.• Level 3 BLAS• Level 3 BLAS

t i t i ti– matrix-matrix operationsM t i t i d t R k k 2k d t T i l l t– Matrix-matrix products, Rank-k, 2k updates, Triangular solvers etc.

S BLAS • Sparse BLAS – BLAS Level 1, 2 & 3 for sparse vectors and matrices

M t i St S hMatrix Storage Schemes:• BLAS: Full Packed and Banded Storage• BLAS: Full, Packed and Banded Storage• Sparse BLAS: CSR and its variations CSC coordinate diagonal skyline storage • Sparse BLAS: CSR and its variations, CSC, coordinate, diagonal, skyline storage,

formats, BSR and its variations.,

SOFTWARE AND SERVICES11

Copyright © 2014, Intel Corporation. All rights reserved.11

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Matrix MultiplicationMatrix Multiplication

Roll Your OwnRoll Your Own

f (i 0 i i ) {for (i = 0; i < N; i++) {for (j=0; j<N; j++) {for (j 0; j<N; j++) {for (k=0; k<N; k++) {for (k=0; k<N; k++) {

c[N*i+j] += a[N*i+k] * b[N*k+j];[ j] [ ] [ j];}}

}}}}

SOFTWARE AND SERVICES12

Copyright © 2014, Intel Corporation. All rights reserved.12

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Matrix MultiplicationMatrix Multiplication

ddot from BLAS Level 1ddot from BLAS Level 1

f (i 0 i i ) {for (i = 0; i < N; i++) {for (j=0; j<N; j++) {for (j 0; j<N; j++) {c[N*i+j] =cblas ddot(N &a[N*i] incx &b[j] incy);c[N*i+j] =cblas_ddot(N,&a[N*i],incx,&b[j],incy);

}}}}

SOFTWARE AND SERVICES13

Copyright © 2014, Intel Corporation. All rights reserved.13

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Matrix MultiplicationMatrix Multiplication

dgemv from BLAS Level 2dgemv from BLAS Level 2

f (i 0 i i ) {for (i = 0; i < N; i++) {cblas dgemv(CblasRowMajor, CblasNoTrans, N, N,cblas_dgemv(CblasRowMajor, CblasNoTrans, N, N,

alpha a N &b[i] N beta &c[i] N);alpha, a, N, &b[i],N,beta,&c[i],N);}}

SOFTWARE AND SERVICES14

Copyright © 2014, Intel Corporation. All rights reserved.14

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Matrix MultiplicationMatrix Multiplication

dgemm from BLAS Level 3dgemm from BLAS Level 3

bl d ( bl j blcblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, alpha, b, N, a,CblasNoTrans, N, N, N, alpha, b, N, a,

N beta c N);N, beta, c, N);

SOFTWARE AND SERVICES15

Copyright © 2014, Intel Corporation. All rights reserved.15

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

A ti it 1 M t i lti li tiActivity 1: Matrix multiplicationy p

Compare the performance of matrix multiply as implemented by C source code, DDOT, DGEMG and DGEMM.sou ce code, O , G G a d G

Exercise control of the threading capabilities in MKL/BLASExercise control of the threading capabilities in MKL/BLAS.

SOFTWARE AND SERVICES16

Copyright © 2014, Intel Corporation. All rights reserved.16

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL LAPACKIntel® MKL: LAPACK®

Routines for:Routines for:Solving systems of linear equations factoring and inverting – Solving systems of linear equations, factoring and inverting matrices and estimating condition numbersmatrices, and estimating condition numbers.S l i l t i l d i l l bl d – Solving least squares, eigenvalue and singular value problems, and Sylvester's equationsSylvester's equations.A ili d ili k– Auxiliary and utility tasks.

Driver Routines: To solve a particular problem, call two or more computational p p , proutines or call a driver routine that combines several tasks in one call

Most important LAPACK optimizations:ost po ta t C opt at o s• Recursive factorization• Recursive factorization

Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)– Reduces scalar time (Amdahl s law: t = tscalar + tparallel/p)E t d bl ki f th i t th d– Extends blocking further into the code

No runtime library support requiredy pp q

SOFTWARE AND SERVICES17

Copyright © 2014, Intel Corporation. All rights reserved.17

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL S LAPACKIntel® MKL: ScaLAPACK®

• LAPACK for distributed memory architectures• LAPACK for distributed memory architecturesU i MPI BLACS d t f BLAS• Using MPI, BLACS and a set of BLAS

• Uses 2D block cyclic data distribution for dense matrix computations y pwhich helpsp

– Better work balance between available processorsBetter work balance between available processorsU BLAS l l 3 f ti i d l l t ti– Use BLAS level 3 for optimized local computations

SOFTWARE AND SERVICES18

Copyright © 2014, Intel Corporation. All rights reserved.18

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL BLACSIntel® MKL: BLACS®

The BLACS routines implemented in Intel MKL are of four categories:The BLACS routines implemented in Intel MKL are of four categories:b• Combines

• Point to Point Communication• Point to Point Communication• Broadcast Broadcast S t• Support.

SOFTWARE AND SERVICES19

Copyright © 2014, Intel Corporation. All rights reserved.19

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL S S lIntel® MKL: Sparse Solvers® p

User callable Linear Sparse SolversPARDISO – Parallel Direct Sparse SolverPARDISO Parallel Direct Sparse Solver

For SMP systems– For SMP systems– High performance, robust and memory efficient– Based on Level-3 BLAS update and pipelining parallelismp p p g p– OOC version for huge problem sizes– OOC version for huge problem sizes

DSS Di t S S l I t f t PARDISODSS – Direct Sparse Solver Interface to PARDISO– Alternative to PARDISO– Steps: Create ->Define Array Struct->reorder->factor->solve-– Steps: Create ->Define Array Struct->reorder->factor->solve-

>Delete>DeletelISS – Iterative Sparse Solver

– RCI basedRCI basedFor symmetric positive definite and for non symmetric indefinite – For symmetric positive definite and for non-symmetric indefinite systemssystems

SOFTWARE AND SERVICES20

Copyright © 2014, Intel Corporation. All rights reserved.20

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL V t M th Lib (VML)Intel® MKL: Vector Math Library (VML)® y ( )

Highly optimized implementations of computationally expensive core g y p p p y pmathematical functions (power, trigonometric, exponential, mathematical functions (power, trigonometric, exponential, hyperbolic etc ) hyperbolic etc.)

Operates on a vector unlike libm.

Multiple accuracy modesp y• High accuracy (HA) ~53 bits accurate g y ( )• Lower accuracy (LA), faster ~51 bits accuratey ( ),• Enhanced Performance (EP) ~26 bits accurate( )

Special value handling √(-a), sin(0), and so onCan improve performance of non-linear programming and integrals computations applications.

SOFTWARE AND SERVICES21

Copyright © 2014, Intel Corporation. All rights reserved.21

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL V t St ti ti l Lib (VSL)Intel® MKL: Vector Statistical Library (VSL)® y ( )

Functions for:Functions for: Basic RNGs

• Generating vectors of pseudorandom and Pseudo RNGs Quasi RNGs

quasi-random numbers MC Sob• Convolution & Correlation

MCG31,

Sobol-quasi, Convolution & Correlation

Parallel computation support some functions,

GFSR250, q ,

NiederreiterParallel computation support – some functions MRG32,

MCG59 WH quasi

User can supply own BRNG or transformations MCG59, WH, MT19937 pp y MT19937, MT2203

Performance Comparison of Random Number Generator

Distribution GeneratorsIntel Xeon 5300 Running Time (s)

Speedup

Continuous DiscreteTime (s)

Standard rand() 40 52 1 Continuous Discrete

Uniform Gaussian (two Uniform UniformBits

Standard rand() 40.52 1

Uniform, Gaussian (two methods), Exponential,

Uniform, UniformBits, Bernoulli, Geometric,

Intel MKL VSL RNG 6.88 5.89), p ,

Laplace, Weibull, Cauchy, Rayleigh Lognormal

, ,Binomial, Hypergeometric, Poission PoissonV

MKL + OpenMP* version (8threads) 0.92 44.04

Rayleigh, Lognormal, Gumbel, Gamma, Beta

Poission, PoissonV, NegBinomial

C fi ti I f, , g

Configuration Info:• Quad-Core Intel® Xeon® processor 5300 seriesp• 2.4 GHz, 2x8MB L2 cache, 4 GB memory• Windows Server* 2003 Enterprise x64 Edition• Windows Server 2003 Enterprise x64 Edition • Test run on a vector of 1000 elements

I t l MKL 10 0 d I t l® C C il 10 1Excellent Multi-core Scaling

• Intel MKL 10.0 and Intel® C++ Compiler 10.1g

SOFTWARE AND SERVICES22

Copyright © 2014, Intel Corporation. All rights reserved.22

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Using VSLUsing VSL

Basically a 3-step ProcessBasically a 3 step Process1 C eate a st eam pointe1. Create a stream pointer.

VSLStreamStatePtr stream;

2 Create a stream2. Create a stream.l ( 31 d)vslNewStream(&stream,VSL_BRNG_MC_G31,seed);

3 Generate a set of RNGs3. Generate a set of RNGs.lR U if (0 & t i t t t d)vslRngUniform(0,&stream,size,out,start,end);

4. Delete a stream (optional).4. Delete a stream (optional).vslDeleteStream(&stream);vslDeleteStream(&stream);

SOFTWARE AND SERVICES23

Copyright © 2014, Intel Corporation. All rights reserved.23

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

A ti it 2 C l l ti Pi i M t Activity 2: Calculating Pi using a Monte y g gCarlo methodCarlo method

Compare the performance of C source code (RAND function) and VSLCompare the performance of C source code (RAND function) and VSL.Exercise control of the threading capabilities in MKL/VSL.g p

SOFTWARE AND SERVICES24

Copyright © 2014, Intel Corporation. All rights reserved.24

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Intel® MKL: Fast Fourier Transform (FFT)Intel® MKL: Fast Fourier Transform (FFT)

• 1 2 & 3 dimensional• 1, 2 & 3 dimensional• Multithreaded• Mixed radix• Mixed radix

U ifi d li f i• User-specified scaling, transform sign• Multiple one-dimensional transforms on single call• Multiple one dimensional transforms on single call

St id• Strides• Supports FFTW interface through wrappersSupports FFTW interface through wrappers

SOFTWARE AND SERVICES25

Copyright © 2014, Intel Corporation. All rights reserved.25

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Using the Intel® MKL DFTsUsing the Intel® MKL DFTs

Basically a 3-step ProcessBasically a 3 step Process1 C t d i t1. Create a descriptor.

Status = DftiCreateDescriptor(MDH, …)Status DftiCreateDescriptor(MDH, …)

2 Commit the descriptor (instantiates it)2. Commit the descriptor (instantiates it).Status = DftiCommitDescriptor(MDH)

3 Perform the transform3. Perform the transform.St t DftiC t F d(MDH X)Status = DftiComputeForward(MDH, X)

Optionally free the descriptor.p y p

MDH: MyDescriptorHandleMDH: MyDescriptorHandle

SOFTWARE AND SERVICES26

Copyright © 2014, Intel Corporation. All rights reserved.26

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Intel® MKL: Cluster FFTIntel® MKL: Cluster FFT

• FFT for SMP systems/clustersy /• Works with MPI using BLACSWorks with MPI using BLACS• 1 2 3 and multidimensional • 1, 2, 3 and multidimensional • Require basic MPI programming skills• Require basic MPI programming skills

Same interface as the DFT from standard MKL• Same interface as the DFT from standard MKL

Intel may make changes to specification, product descriptions, and plans at any time, without notice. Performance tests and ratings are measured using specific computer systems and/or components and p p y preflect the approximate performance of Intel products as measured by those tests. Any difference in systemas measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance Buyers should consultaffect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they areperformance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intelperformance tests and on the performance of Intel products, reference www.intel.com/software/products

ll (U S ) 1 800 628 8686 1 916 356 3104or call (U.S.) 1-800-628-8686 or 1-916-356-3104

SOFTWARE AND SERVICES27

Copyright © 2014, Intel Corporation. All rights reserved.27

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Intel® MKL: Partial Differential EquationsIntel® MKL: Partial Differential Equations

• Poisson LibraryPoisson Libraryfor fast solving of simple Helmholtz Poisson and – for fast solving of simple Helmholtz, Poisson, and L l blLaplace problems

• Trigonometric Transform interface routines Trigonometric Transform interface routines lInitialize

Change Routine parameters manuallyChange Routine parameters manually

? i i i fC it

?_init_trig_transformCommit?_commit_trig_transform

?_forward_trig_transform

F d/B k d T f?_backward_trig_transform

Forward/Backward Transformfree_trig_transform_ _

freefree

SOFTWARE AND SERVICES28

Copyright © 2014, Intel Corporation. All rights reserved.28

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Intel® MKL: Optimization SolversIntel® MKL: Optimization Solvers

Optimization solver routines for:Optimization solver routines for:– solving nonlinear least squares problems without constraintsg q p– solving nonlinear least squares problems with boundary constraintssolving nonlinear least squares problems with boundary constraints

computing the Jacobi matrix by central differences for solving – computing the Jacobi matrix by central differences for solving li l t blnonlinear least squares problem

Based on Trust Region (TR) MethodsBased on Trust Region (TR) Methods.– TR strength - global and super-linear convergence which

differentiates them from the first order methods and unmodified Newton methods.

SOFTWARE AND SERVICES29

Copyright © 2014, Intel Corporation. All rights reserved.29

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Intel® MKL: GMPIntel® MKL: GMP

• Arbitrary precision arithmetic routines on integers• Arbitrary precision arithmetic routines on integers• Interface fully match with Gnu Multi Precision liby• If your application uses GMP functions link with the MKL and • If your application uses GMP functions, link with the MKL and

libm libraries libm libraries. – For example on IA32: $CC prog.c -L$MKL_LIB_PATH —lmkl_intelp

—lmkl core —liomp5 -lpthread -lmlmkl_core liomp5 lpthread lm

Optimized for Intel processors• Optimized for Intel processors

SOFTWARE AND SERVICES30

Copyright © 2014, Intel Corporation. All rights reserved.30

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL S t F tiIntel® MKL: Support Functions® pp

I t l® MKL t f ti d tIntel® MKL support functions are used to:– retrieve information about the current Intel MKL version– additionally control the number of threadsadditionally control the number of threads

h dl – handle errors– test characters and character strings for equality– measure user time for a process and elapsed CPU timemeasure user time for a process and elapsed CPU time

set and measure CPU frequency– set and measure CPU frequencyf ll d b I l MKL f– free memory allocated by Intel MKL memory management software

SOFTWARE AND SERVICES31

Copyright © 2014, Intel Corporation. All rights reserved.31

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Li ki ith I t l® MKLLinking with Intel® MKLg ®

• Static Linking• Static Linking• Dynamic linking• Custom Dynamic Linkingu o y a g

Quick Comparison of Intel MKL Linkage Models

Feature Dynamic Linkage Static Linkage Custom Dynamic Linkage

Processor Updates Automatic Automatic Recompile and redistribute

Optimization All Processors All Processors All Processors

Build Link to import Link to static Build separate import libraries, plibraries libraries

p p ,which are created automatically

Calling Regular Names Regular Names Modified Names

Total Binary Size Large Small SmallTotal Binary Size Large Small Small

Executable Size Smallest Small SmallestExecutable Size Smallest Small Smallest

Multi-threaded/ thread safe

Yes Yes Yesthread safe

SOFTWARE AND SERVICES32

Copyright © 2014, Intel Corporation. All rights reserved.32

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Li ki ith I t l® MKL tdLinking with Intel® MKL contd..g ®

L d d l h f b tt t lLayered model approach for better control• Interface Layer

Compiler: Intel / GNU Interfaces LP64 / ILP64 Interfaces

• Threading Layer Threading Threaded / alternate OpenMP

gComputationp

Sequential Computationq• Computational Layer Run-timep y• Run-time Layer

Run timey

Choose the libs from each layer for linking.

E 1 St ti li ki i I t l® F t C il BLAS I t l® 64 LiEx 1: Static linking using Intel® Fortran Compiler, BLAS, Intel® 64 processor on Linux

$ifort myprog f libmkl intel lp64 a libmkl intel thread a libmkl core a libiomp5 so$ifort myprog.f libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a libiomp5.so

Ex 2: Dynamic linking with Intel® C++ compiler on Windowsy g ® p

c:\>icl mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.dll

Note: Strongly recommended to link Run-time layer library dynamicallySOFTWARE AND SERVICES

g y y y y y33

Copyright © 2014, Intel Corporation. All rights reserved.33

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

I t l® MKL Th diIntel® MKL Threading® g

There are numerous opportunities for threading:g• Level 3 BLAS ( O(n3) )e e 3 S ( O( 3) )• LAPACK* ( O(n3) )• LAPACK ( O(n3) )• FFTs ( O(n log(n) )• FFTs ( O(n log(n) )• VML VSL ? depends on processor • VML, VSL ? depends on processor

and functionand function

Not threaded for some routines due to: – Limited resource is memory • Threaded using OpenMP*bandwidth.

Th di l l 1 d l l 2 BLAS

Threaded using OpenMP– With support for GCC* and

– Threading level 1 and level 2 BLAS tl i ff ti ( O( ) )

With support for GCC and Microsoft* OpenMP*

are mostly ineffective ( O(n) )

• ScaLAPACK and Cluster FFT are SMP ParallelAll I l® MKL i h d f• All Intel® MKL is thread-safe

SOFTWARE AND SERVICES34

Copyright © 2014, Intel Corporation. All rights reserved.34

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

Threading Control in Intel® MKLThreading Control in Intel® MKL

Set OpenMP or Intel MKL environment variable:OMP_NUM_THREADS

MKL_NUM_THREADS

MKL_DOMAIN_NUM_THREADS

Call OpenMP or Intel MKL usingp gomp_set_num_threads()p_ _ _

mkl_set_num_threads()_ _ _ ()

mkl domain set num threads()_ _ _ _ ()

MKL DYNAMIC/mkl set dynamic(): Intel® MKL decides the number of threadsMKL_DYNAMIC/mkl_set_dynamic(): Intel® MKL decides the number of threads.Example: You could configure Intel MKL to run 4 threads for BLAS, but sequentially in all other parts of Example: You could configure Intel MKL to run 4 threads for BLAS, but sequentially in all other parts of

the library• Environment variable

set MKL_DOMAIN_NUM_THREADS=“MKL_ALL=1, MKL_BLAS=4”F ti ll• Function calls

mkl domain set num threads( 1 MKL ALL);mkl_domain_set_num_threads( 1, MKL_ALL);mkl domain set num threads( 4, MKL BLAS);mkl_domain_set_num_threads( 4, MKL_BLAS);

SOFTWARE AND SERVICES35

Copyright © 2014, Intel Corporation. All rights reserved.35

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

P f Lib i I t l® MKLPerformance Libraries: Intel® MKL®What’s Been CoveredWhat s Been Covered

• Intel® Math Kernel Library is a broad scientific/engineering math library.• It is optimized for Intel® processors.p ® p• It is threaded for effective use on multi-core and SMP machines• It is threaded for effective use on multi core and SMP machines.

SOFTWARE AND SERVICES36

Copyright © 2014, Intel Corporation. All rights reserved.36

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Intel® Academic CommunityIntel® Academic Community

R fReferences

Intel® MKL product Informationp odu o a owww intel com/software/products/mkl– www.intel.com/software/products/mkl

Technical Issues/Questions/Feedback/Q /– http://premier intel com/– http://premier.intel.com/

lf h lSelf-help– http://www intel com/software/products/mkl http://www.intel.com/software/products/mkl

( Click “Support Resources” tab)( Click Support Resources tab)User Discussion Forum

– http://softwareforums intel com/ids/board?board id=MKL http://softwareforums.intel.com/ids/board?board.id=MKL Wh h f l ?What are the new software tools?

– http://whatif intel comhttp://whatif.intel.com

SOFTWARE AND SERVICES37

Copyright © 2014, Intel Corporation. All rights reserved.37

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

3838


Recommended