+ All Categories
Home > Documents > Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core...

Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core...

Date post: 03-Jan-2016
Category:
Upload: timothy-webster
View: 220 times
Download: 3 times
Share this document with a friend
Popular Tags:
39
Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007
Transcript
Page 1: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

Intel Math Kernel Library (MKL)Clay P. Breshears, PhD

Intel Software College

NCSA Multi-core WorkshopJuly 24, 2007

Page 2: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

2

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

Performance Features

The Library Sections• BLAS• LAPACK*• DFTs• VML• VSL

SciMark 2.0 Optimization Case Study (from Henry Gabb)

• SciMark 2.0 overview

• Tuning with the Intel compiler

• Tuning with the Intel Math Kernel Library

Page 3: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

3

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose

Performance, Performance, Performance!

Intel’s engineering, scientific, and financial math library

Addresses:

• Solvers (BLAS, LAPACK)

• Eigenvector/eigenvalue solvers (BLAS, LAPACK)

• Some quantum chemistry needs (dgemm)

• PDEs, signal processing, seismic, solid-state physics (FFTs)

• General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)]

Tuned for Intel® processors – current and future

Page 4: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

4

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose – Don’ts

But don’t use Intel® Math Kernel (Intel® MKL) on …

Don’t use Intel® MKL on “small” counts

Don’t call vector math functions on small n

X’Y’Z’W’

XYZW

=4x4

Transformationmatrix

Geometric Transformation

But you could use Intel® Performance Primitives

Page 5: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

5

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Environment

Support 32-bit and 64-bit Intel® processors

Large set of examples and tests

Extensive documentation

Windows* Linux*

Compilers Intel, Microsoft Intel, Gnu

Libraries .dll, .lib .a, .so

Page 6: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

6

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Resource Limited Optimization

The goal of all optimization is maximum speed

Resource limited optimization – exhaust one or more resource of system:

• CPU: Register use, FP units

• Cache: Keep data in cache as long as possible; deal with cache interleaving

• TLBs: Maximally use data on each page

• Memory bandwidth: Minimally access memory

• Computer: Use all the processors/cores available using threading

• System: Use all the nodes available (cluster software)

Page 7: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

7

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Threading

Most of Intel® Math Kernel Library could be threaded but:

• Limited resource is memory bandwidth

• Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) )

There are numerous opportunities for threading:

• Level 3 BLAS ( O(n3) )

• LAPACK* ( O(n3) )

• FFTs ( O(n log n ) )

• VML, VSL ? depends on processor and function

All threading is via OpenMP*

All Intel MKL is designed and compiled for thread safety

Page 8: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

8

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SciMark 2.0

Produced by the National Institute of Standards and Technology

ANSI C and Java versions available

Five floating-point-intensive kernels

• FFT: Compute a complex 1D FFT

• SOR: Jacobi successive over-relaxation in 2D

• MC: Compute by Monte Carlo integration

• MV: Sparse matrix-vector multiplication

• LU: Dense matrix LU factorization

Page 9: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

9

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

SciMark 2.0 Problem Sizes

Benchmark

Problem Size

Small Large

FFT N = 1024 N = 1048576

SOR 100 x 100 1000 x 1000

MC Problem size not fixed, no distinction between small and large problems

MVN = 1000

NZ = 5000

N = 100000

NZ = 1000000

LU 100 x 100 1000 x 1000

Page 10: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

10

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Benchmark System

Hardware

CPU (dual-processor system) 3.6 GHz Xeon (2 MB L2 cache) EM64T

Motherboard Intel Server Board SE7520AF2

Memory 512 MB DDR2

BIOS

Version P06

Adjacent Cache Line Prefetch ON

Hardware Prefetch ON

Hyper-Threading Technology OFF

Software

Operating system Red Hat Enterprise Linux AS3

Linux kernel 2.4.21-20.EL #1 SMP

Intel C++ Compiler for Linux 8.1 (l_cce_pc_8.1.024)

Intel Cluster MKL 7.2 (l_cluster_mkl_7.2.008)

GNU C Compiler gcc 3.2.3

Page 11: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

11

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

GNU Performance Baseline

Small Problems

0

200

400

600

800

1000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

Default Optimized

Large Problems

0

100

200

300

400

500

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

Default Optimized

Aggressive optimization significantly improves performance relative to the default optimization level. The following gcc options were used to establish baseline performance: –O3 –march=nocona –ffast-math –mfpmath=sse

Page 12: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

12

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel C++ Compiler for Linux

Performance• Automatic vectorization• Streaming SIMD Extensions 3• IPO and PGO• Automatic parallelization and OpenMP support• Automatic CPU dispatch• Much more...

Compatibility• Source and object compatible with gcc and g++• Supports GNU inline ASM• ANSI/ISO C/C++ standards compliance• Conforms to the C++ ABI standard• Integrated with the Eclipse IDE

Page 13: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

13

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning SciMark 2.0 with the Intel Compiler

Small Problems

0

500

1000

1500

2000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Large Problems

0200400600800

10001200

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

The Intel C++ Compiler for Linux improves SciMark 2.0 performance relative to the GNU baseline. Intel compiler options: –O3 –xP –ipo –fno-alias.

Page 14: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

14

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library ContentsBLAS

BLAS (Basic Linear Algebra Subroutines)

Level 1 BLAS – vector-vector operations• 15 function types• 48 functions

Level 2 BLAS – matrix-vector operations• 26 function types• 66 functions

Level 3 BLAS – matrix-matrix operations• 9 function types• 30 functions

Extended BLAS – level 1 BLAS for sparse vectors• 8 function types• 24 functions

Page 15: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

15

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library ContentsLAPACK

LAPACK (linear algebra package)• Solvers and eigensolvers. Many hundreds of routines total!

• There are more than 1000 total user callable and support routines

DFTs (Discrete Fourier transforms)• Mixed radix, multi-dimensional transforms

• Multithreaded

VML (Vector Math Library)• Set of vectorized transcendental functions

• Most of libm functions, but faster

VSL (Vector Statistical Library)• Set of vectorized random number generators

Page 16: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

16

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents

BLAS and LAPACK* are both Fortran

• Legacy of high performance computation

VSL and VML have Fortran and C interfaces

DFTs have Fortran 95 and C interfaces

cblas interface available

• More convenient for a C/C++ programmer

Page 17: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

17

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Optimizations in LAPACK*

Most important LAPACK optimizations:

• Threading – effectively uses multiple cores

• Recursive factorization• Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)

• Extends blocking further into the code

No runtime library support required

Page 18: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

18

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 LU Kernel

Replacing the SciMark 2.0 LU kernel with the LAPACK dgetrf function requires attention to detail:

• SciMark 2.0 is written in C• LAPACK defines a Fortran interface

• C is call-by-value• Fortran is call-by-reference

• C uses row-major ordering• Fortran uses column-major ordering

• For best performance, dgetrf requires data to be contiguous in memory

• SciMark 2.0 LU kernel allocates a 2D array as pointers-to-pointers (not necessarily contiguous in memory)

Page 19: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

19

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 LU Kernel

0

1000

2000

3000

4000

5000

6000

7000

MFLO

PS

Small Large

SciMark 2.0 LU Kernel

GNU baseline

Intel compiler

Intel MKL LAPACK

Intel MKL LAPACK+ OpenMP

The Intel MKL Lapack significantly improves performance over the original SciMark 2.0 LU source code.

Page 20: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

20

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Discrete Fourier Transforms

One dimensional, two-dimensional, three-dimensional…

Multithreaded

Mixed radix

User-specified scaling, transform sign

Transforms on embedded matrices

Multiple one-dimensional transforms on single call

Strides

C and F90 interfaces; FFTW interface support

Page 21: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

21

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using the Intel® Math Kernel Library DFTs

Basically a 3-step Process

Create a descriptor

Status = DftiCreateDescriptor(MDH, …)

Commit the descriptor (instantiates it)

Status = DftiCommitDescriptor(MDH)

Perform the transform

Status = DftiComputeForward(MDH, X)

Optionally free the descriptor

MDH: MyDescriptorHandle

Page 22: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

22

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 FFT Kernel

#include <mkl.h>

int N = 1024; // Size of SciMark 2.0 small FFT problemdouble scale = 1.0 / (double)N;

double *x = RandomVector ((2 * N), R); // SciMark creates a random vector // of size 2*N to hold real and // imaginary partsDFTI_DESCRIPTOR *dftiHandle; // Structure for MKL DFT descriptor

DftiCreateDescriptor (&dftiHandle, // Transform descriptor DFTI_DOUBLE, // Precision DFTI_COMPLEX, // Complex-to-complex 1, // Number of dimensions N); // Size of transform

// Apply scaling factor to backward transformDftiSetValue (dftiHandle, DFTI_BACKWARD_SCALE, scale);

DftiCommitDescriptor (dftiHandle);

DftiComputeForward (dftiHandle, x); // Apply DFT to array xDftiComputeBackward (dftiHandle, x);

DftiFreeDescriptor (&dftiHandle);

Page 23: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

23

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 FFT Kernel

0

500

1000

1500

2000

MFLO

PS

Small Large

SciMark 2.0 FFT Kernel

GNU baselineIntel compilerIntel MKL DFT

The Intel MKL DFT significantly improves performance over the original SciMark 2.0 FFT source code.

Page 24: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

24

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Vector Math Library (VML)

Vector Math Library: vectorized transcendental functions – like libm but better (faster)

Interface: Have both Fortran and C interfaces

Multiple accuracies

• High accuracy ( < 1 ulp )

• Lower accuracy, faster ( < 4 ulps )

Special value handling √(-a), sin(0), and so on

Error handling – can not duplicate libm here

Page 25: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

25

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

VML: Why Does It Matter?

It is important for financial codes (Monte Carlo simulations)

• Exponentials, logarithms

Other scientific codes depend on transcendental functions

Error functions can be big time sinks in some codes

Page 26: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

26

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents Vector Statistical Library (VSL)

Set of random number generators (RNGs)

Numerous non-uniform distributions

VML used extensively for transformations

Parallel computation support – some functions

User can supply own BRNG or transformations

Five basic RNGs (BRNGs)

• MCG31, R250, MRG32, MCG59, WH

Page 27: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

27

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Non-Uniform RNGs

Gaussian (two methods)

Exponential

Laplace

Weibull

Cauchy

Rayleigh

Lognormal

Gumbel

Page 28: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

28

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using VSL

Basically a 3-step Process

Create a stream pointer

VSLStreamStatePtr stream;

Create a stream

vslNewStream(&stream,VSL_BRNG_MC_G31,seed );

Generate a set of RNGs

vsRngUniform( 0,&stream,size,out,start,end );

Delete a stream (optional)

vslDeleteStream(&stream);

Page 29: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

29

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Calculating Pi by Monte Carlo

squarein darts of #

circle hitting darts of#4

41

squarein darts of #

circle hitting darts of#2

2

rr

Loop I = 1 to N_samples

x.coor = random [0..1]

y.coor = random [0..1]

dist = sqrt (x^2 + y^2)

if dist <= 1

hits = hits + 1

Pi = 4 * hits / N_samples

r

Page 30: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

30

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 MC Kernel

#include <mkl.h>

double MonteCarlo_integrate (int Num_samples){ int i, j, blocks, under_curve = 0; static double rnBuf[2 * BLOCK_SIZE]; double rnX, rnY; VSLStreamStatePtr stream;

blocks = Num_samples / BLOCK_SIZE; vslNewStream (&stream, VSL_BRNG_MCG31, SEED);

for (i = 0; i < blocks; i++) { vdRngUniform (VSL_METHOD_DUNIFORM_STD, stream, (2 * BLOCK_SIZE), rnBuf, 0.0, 1.0);

for (j = 0; j < BLOCK_SIZE; j++) { rnX = rnBuf[2*j]; rnY = rnBuf[2*j+1]; if (sqrt(rnX*rnX + rnY*rnY) <= 1.0) under_curve++; } } vslDeleteStream (&stream);

return ((double) under_curve / Num_samples) * 4.0;}

Page 31: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

31

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tuning the SciMark 2.0 MC Kernel

0100200300400500600700800900

1000

MFLO

PS

SciMark 2.0 MC Kernel

GNU baseline

Intel compiler

Intel MKL VSL

Intel MKL VSL +OpenMP

The Intel MKL VSL significantly improves performance over the original SciMark 2.0 MC source code.

Page 32: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

32

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Best SciMark 2.0 Single Node PerformanceSmall Problems

0

500

1000

1500

2000FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Small Problems (MFLOPS)

GNU Intel Speedup

FFT 510 1817 3.6

SOR 524 1092 2.1

MC 206 1003 4.9

MV 857 832 1.0

LU 884 1827 2.1

Comp. 596 1314 2.2

Page 33: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

33

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Best SciMark 2.0 Single Node PerformanceLarge Problems

0

500

1000

1500

2000

FFT

SO

R

MC

MV

LU

Co

mp

.

MFLO

PS

GNU Intel

Large Problems (MFLOPS)

GNU Intel Speedup

FFT 45 600 13.3

SOR 495 1015 2.1

MC 206 1003 4.9

MV 453 457 1.0

LU 392 6646 16.9

Comp. 318 1944 6.1

6646

Page 34: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

34

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Cluster MKL

Intel Cluster MKL is a superset of MKL for solving large linear algebra problems on a cluster

Intel Cluster MKL contains:

• ScaLAPACK (Scalable LAPACK)

• BLACS (Basic Linear Algebra Communication Subprograms)

Supports MPICH and the Intel MPI Library

Page 35: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

35

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Data Layout Critical to Parallel Performance

ScaLAPACK uses 2D block-cyclic data distribution

Example layouts of lower triangular matrix for four processes

0 1

32

0 1

32

0 1

32

0 1

32

2D block-cyclic

distribution

0 1 2 3 0 1 2 3 0 1 2 3

1D block distribution

1D block-cyclic

distribution

2D block-cyclic

distribution

Load balancePoor Better

Page 36: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

36

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Parallelizing the SciMark 2.0 LU Kernel with Intel® Cluster MKL

1. Initialize the process grid

2. Create a descriptor for each distributed matrix

3. Replace the call to dgetrf with pdgetrf (the ‘p’ is for parallel)

Result: LU factorization of a 40000 x 40000 matrix on an 8-node, dual 3.0 GHz Xeon cluster achieves 46000 MFLOPS.

Page 37: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

37

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Performance Libraries: Intel® MKLWhat’s Been Covered

Intel® Math Kernel Library is a broad scientific/engineering math library

It is optimized for Intel® processors

It is threaded for effective use on multi-core and SMP machines

The Intel C++ Compiler for Linux improves SciMark 2.0 performance without requiring code modifications

With minor code modifications, Intel MKL dramatically improves the FFT, MC, and LU kernels

Some SciMark 2.0 kernels benefit from parallel computing

Page 38: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

38

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Useful Links

Intel Software Products• http://www.intel.com/software/products/

Intel Software Network• http://www.intel.com/software/

Intel Software College• http://www.intel.com/software/college/

SciMark 2.0• http://math.nist.gov/scimark2/index.html

Page 39: Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.

39

Copyright © 2007, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.


Recommended