Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Software Development Products for High Performance Computing
and Parallel Programming
Multicore development tools with extensions to many-core
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Notices
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document contains information on products in the design phase of development. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Intel, VTune, Cilk, Xeon and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2012 Intel Corporation. All rights reserved.
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Table of Contents
•Intel in HPC
•Intel HPC Software Development Products
•High Performance Parallel Programming with Intel’s architectures
•Features and benefits
– Investment protection
– Better performance & efficiency
• Call to Action and Summary
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel in High-Performance Computing
A long term commitment to the HPC market segment
Large Scale Clusters
for Test & Optimization
Tera- Scale
Research
Leading Performance,
Energy Efficient
Platform Building Blocks
Dedicated, Renowned Applications
Expertise
Broad Software Tools
Portfolio
Defined HPC
Application Platform
Many Integrated
Core Architecture
Manufacturing Process
Technologies
Exa-Scale Labs
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel Technology is Changing HPC Performance, Energy Efficiency, Reliability, TCO
SOLID STATE DISK
Optimize Performance for
I/O Intensive Apps and
Boot Drive Replacement
10GbE
Bridging the Gap Between
1GbE and InfiniBand*,
with RDMA, Unified Networking
PROCESSORS
Scalable Performance and
Energy Efficiency,
Multi- and Many-Core
A platform approach to high performance
MIC Xeon®
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
CORE CORE CORE CORE
CORE CORE CORE CORE
Message Passing between/inside
Nodes Multi-Threading
within each (SMP) Node
M
I/O
P P
M
I/O
M
I/O
M
I/O
Interconnect
P P P P P P
. . .
. . .
e.g. e.g.
CO-PROCESSOR
M
Vectorization (SIMD) within each Core
The Majority of all HPC-Systems are
Clusters (Source: IDC)
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Software Development and System
Environment
(die sizes not to scale)
Intel® Xeon® Processor
Intel® Many Integrated Core Architecture
Linux*
Established HPC Operating System
Same Comprehensive Set of SW Tools: Application Source Code Builds with a Compiler Switch
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Scaling Performance Forward Software Tools Vision
Employ versatile and
common development tools
across all IA architectures
Single Portable
Software Stack
Flexible
Programmability
Scalable Performance
Data-Parallelism
Thread-Parallelism
Messaging
. . .
Processor
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
High Performance Parallel Programming
Features and Benefits: Details
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Develop & Parallelize
Today for Maximum Performance
Use One Software Architecture Today. Scale Forward Tomorrow.
Cluster
Multicore Cluster
Enabling & Advancing Parallelism High Performance Parallel Programming
10
Code
Compiler Libraries
Parallel Models
Multicore
& Many -core Cluster
Many-core
Multicore CPU Intel®
MIC Architecture Co-processor
Multicore
Multicore CPU
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
More cores. Wider vectors. Co-Processors. Tools need to access all three dimensions to deliver performance
Images do not reflect actual die sizes
Intel® Xeon® processor
64-bit
Intel® Xeon® processor
5100 series
Intel® Xeon® processor
5500 series
Intel® Xeon® processor
5600 series
Intel® Xeon® processor
code-named
Sandy Bridge
Intel® Xeon® processor
code-named
Ivy Bridge
Intel® Xeon® processor
code-named
Haswell
Intel® MIC co-processor
code-named
Knights Ferry
Intel® MIC co-processor
code-named
Knights Corner
Core(s) 1 2 4 6 8 32 >50
Threads 2 2 8 12 16 128 >200
SIMD Width
128 128 128 128 256 256 256 512 512
SSE2 SSSE3 SSE4.2 SSE4.2 AVX AVX AVX2 FMA3
Software challenge: Develop scalable software
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
High Performance Software Products Supporting Multicore and Many-core Development
Intel® Cluster Studio XE* Distributed Performance
Intel® Parallel Studio XE* Advanced Performance
Intel® Trace Analyzer and Collector
Intel® MPI Library
Intel® Inspector XE, Intel® VTune™ Amplifier XE, Intel® Advisor
Intel® C/C++ and Fortran Compilers w/OpenMP
Intel® MKL, Intel® Cilk Plus, Intel® TBB Library, Intel® ArBB Library Intel® IPP Library
Intel® Parallel Studio XE
Performance. Scale Forward. Proven
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Invest in Common Tools and
Programming Models
Intel® Xeon® processors are
designed for intelligent
performance and smart
energy efficiency
Continuing to advance Intel®
Xeon® processor family and
instruction set (e.g., Intel® AVX,
etc.)
Multicore
Intel® MIC Architecture - co-
processors are ideal for highly
parallel computing applications
Software development
platforms ramping now
+
Many-core
Tomorrow
Use One Software Architecture Today. Scale Forward Tomorrow.
Code
Today
Use One Software
Architecture
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Optimized Intel Libraries
Intel® MKL
Math Kernel Library • Science, Engineering and Financial applications
oriented
• Incl. BLAS, LAPACK, ScaLAPACK, Sparse Solvers, Fast
Fourier Transforms, Vector Math
Intel® IPP
Integrated Performance Primitives • Multimedia, Data Processing, and Communications
applications oriented
• Cryptography and String Processing
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
void foo() /* Intel® Math Kernel Library */ {
float *A, *B, *C; /* Matrices */
sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N);
}
Go Parallel with High Performance Math Kernel
Library Intel® Math Kernel Library (Intel® MKL)
Intel® Xeon® processor Intel® MIC co-processor
Implicit automatic offloading requires no code
changes, simply link with the offload MKL Library
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with Intel® Cilk™ Plus
16
• Proven Cilk parallel model, teachable in one minute
– Parallelism in Three Key Words: • cilk_spawn • cilk_sync • cilk_for
• Cilk™ Plus: an open specification
– Recently placed into open source by Intel for the advancement of parallel programming
Learn more at http://cilkplus.org
// Parallel function invocation, in C
cilk_for (int i=0; i<n; ++i){ Foo(a[i]); }
// Parallel spawn in a recursive fibonacci // computation, in C int fib (int n) { if (n < 2) return 1; else { int x, y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return x + y; } }
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
//pragma SIMD: User-mandated // vectorization
#pragma simd for (i=0; i<n; i++) { A[i] = A[i]+ B[i] + C[i]; }
// Simplify operation using // array notations in C/C++:
a[:] = b[:] + c[:];
// Elemental functions, in C, // using Cilk Plus: __declspec (vector) void saxpy(float a, float x, float &y) { y += a * x; }
Go Parallel with Intel® Cilk™ Plus
• Data and Task Parallelism as first class citizens in C and C++
– vectorization via intuitive notations that automatically span MMX, SSE, AVX, and wider widths in the future including those in MIC co-processors • array notations
• #pragma SIMD controls
• elemental functions
Learn more at http://cilkplus.org
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with Intel® Threading Building
Blocks (Intel® TBB)
• A popular parallel abstraction for C++ developers
– A C++ template library
– Scalable memory allocation
– Load-balancing
– Work-stealing task scheduling
– Thread-safe pipeline
– Concurrent containers
– High-level parallel algorithms
– Numerous synchronization primitives
• Intel remains a leading participant and contributor in the TBB open source project as well as a leading supplier of TBB support and supporting tools
//Parallel function invocation example, in C++,
//using TBB:
parallel_for (0, n, [=](int i) {
Foo(a[i]);
});
Learn more at http://threadingbuildingblocks.org
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with Message Passing Interface Intel® Message Passing Interface (Intel® MPI)
• Extend your cluster solutions to
the Intel® MIC Architecture
– E.g., Intel MIC in every node of the cluster
using Intel® MPI and Intel® Parallel Building
Blocks on nodes
– Same model as an Intel® Xeon processor
based cluster
• Intel is a leading vendor of MPI
implementations and tools
19
Learn more at http://intel.com/go/mpi
Clusters with Multicore and Many-core
… …
Multicore Cluster
Clusters
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with Coarray Fortran Intel® Fortran Compiler
• A standard, explicit notation for data decomposition, such as that often used in message-passing models, expressed in a natural Fortran-like syntax.
• For parallel programming on both shared memory and distributed memory systems
20
!Sum in Fortran, using co-array
feature:
REAL SUM[*]
CALL SYNC_ALL( WAIT=1 )
DO IMG= 2,NUM_IMAGES()
IF (IMG==THIS_IMAGE()) THEN
SUM = SUM + SUM[IMG-1]
ENDIF
CALL SYNC_ALL( WAIT=IMG )
ENDDO
Learn more at http://intel.com/software/products
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with OpenMP* Intel® C/C++ and Fortran Compilers
• A flexible interface for developing parallel applications
– An abstraction for multi-threaded solutions
• OpenMP* is a standard used by many parallel applications
– Supported by every major compiler for Fortran, C, and C++
21
//C/C++ OpenMP* Pragma !Fortran OpenMP*
#pragma omp parallel for reduction(+:pi)
for (i=0; i<count; i++) {
float t = (float)((i+0.5)/count);
pi += 4.0/(1.0+t*t);
}
pi /= count;
!$omp parallel do
do i=1,10
A(i) = B(i) * C(i)
enddo
!$omp end parallel
Learn more at http://openmp.org
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Xeon® processor Intel® MIC co-processor
main() { double pi = 0.0f; long i;
for (i=0; i<N; i++)
{ double t = (double)((i+0.5)/N); pi += 4.0/(1.0+t*t); } printf("pi = %f\n",pi/N); }
Go Parallel with OpenMP* Intel® C/C++ and Fortran Compilers
#pragma omp parallel for reduction(+:pi) #pragma offload target (mic)
One Line Change to Offload to MIC Co-Processor
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Go Parallel with C/C++ Language Extensions
• Simple Keyword
Language
Extensions to control
offloading to MIC co-
processor
23
C/C++ Language Extensions
class _Shared common {
int data1;
char *data2;
class common *next;
void process();
};
_Shared class common obj1, obj2;
… _Cilk_spawn _Offload obj1.process();
_Cilk_spawn obj2.process();
…
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Use the Same Code for Execution on
Intel® MIC Architecture by Offloading
24
C/C++ Offload Pragma
#pragma offload target (mic)
#pragma omp parallel for reduction(+:pi)
for (i=0; i<count; i++) {
float t = (float)((i+0.5)/count);
pi += 4.0/(1.0+t*t);
}
pi /= count;
MKL Implicit Offload
//MKL implicit offload requires no source code changes, simply link with the offload MKL Library.
MKL Explicit Offload
#pragma offload target (mic) \
in(transa, transb, N, alpha, beta) \
in(A:length(matrix_elements)) \
in(B:length(matrix_elements)) \
in(C:length(matrix_elements)) \
out(C:length(matrix_elements)alloc_if(0))
sgemm(&transa, &transb, &N, &N, &N, &alpha,
A, &N, B, &N, &beta, C, &N);
Fortran Offload Directive
!dir$ omp offload target(mic)
!$omp parallel do
do i=1,10
A(i) = B(i) * C(i)
enddo
!$omp end parallel
C/C++ Language Extensions
class _Shared common {
int data1;
char *data2;
class common *next;
void process();
};
_Shared class common obj1, obj2;
…
_Cilk_spawn _Offload obj1.process();
_Cilk_spawn obj2.process();
…
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Parallelism with OpenCL* Intel® OpenCL SDK • OpenCL* is a framework for writing programs that execute across
heterogeneous platforms (e.g., CPUs, GPUs, many-core)
• Intel is a leading participant in the OpenCL* standard efforts, and a vendor of solutions and related tools with early implementations available today.
• OpenCL* addresses the needs of customers in specific segments
25
//Simple per element multiplication using OpenCL*:
kernel void dotprod( global const float *a,
global const float *b,
global float *c)
{
int myid = get_global_id(0);
c[myid] = a[myid] * b[myid];
}
Learn more at http://intel.com/go/opencl
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Host Program
Host Offload Library
Message Library
Target Program
Target Offload Library
Message Library
Running your Application Execution on the host and Intel® MIC Co-processor(s)
26
Without: Intel® MIC Co-processor(s) are absent
With: Intel® MIC Co-processor(s) are present
Application starts and executes on host
Application starts on host and executes portions on Intel MIC Co-processor(s)
At runtime, if Intel® MIC Co-processor(s) are available, the target binary is loaded
At each offload, the construct runs on host cores/threads
At each offload, the construct runs on the Intel MIC® Co-processor(s)
Normal program termination on host
At program termination, target binary is unloaded
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Using the Intel® Debugger Overview
27
• Debugging of host and target simultaneously
• If host application is being debugged, target application is also debugged automatically
• Debugger runs on host for both host and target program
• Debugger halts and resumes both host and target program synchronously
• Full C, C++ and Fortran support on both sides
• Future: debugger presents view of one virtual application inside a single GUI
• Extensible to cover more than one offload card
Intel® Debugger
Host Program
Target Program Target
Program
Intel® Debug Server
Intel® Debug Server
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Analyzing your Application Performance Analysis Tools
• Intel® VTune™ Amplifier XE performance profiler
– Analyze your multicore and many-core performance
• Analyze performance of the application in offload mode
• Support for Intel® MIC Co-processors includes:
– A Linux* hosted command line tool that collects events
– The VTune™ Amplifier XE graphical user interface to display results collected in previous step highlighting bottlenecks, time spent and other details of performance.
28
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Preserve Your Development Investment Common Tools and Programming Models for Parallelism
29
Multicore
Many-core
Heterogeneous
Computing
Intel® Cilk Plus
Intel® TBB Offload Pragmas
OpenCL*
OpenMP*
OpenMP*
Coarray
Offload Directives
Intel® MPI
Intel® MKL
C/C++
Fortran
Intel® C/C++ Compiler
Intel® Fortran Compiler
Develop Using Parallel Models that Support Heterogeneous Computing
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Call to Action
• Evaluate the Intel® Software Development Products, including the family of Parallel Programming Models, for your High Performance needs:
http://www.intel.com/software/products/eval
• For product information see:
http://www.intel.com/software/products
30
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 31
Software & Services Group, Developer Products Division Copyright© 2012 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Optimization Notice
32