INTEL® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
INTEL® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
JUPYTER: PYTHON, JULIA, C, AND MKL HPC BATTERIES INCLUDED Oleg Mikulchenko
Intel Corporation
November 2016
INTEL® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
JUPYTER: PYTHON, JULIA, C, AND MKL HPC BATTERIES INCLUDED Oleg Mikulchenko
Intel Corporation
November 2016
4
Agenda
§ Motivation (“Insight, not numbers”)
§ Use case 1: Python, Julia, and C in Jupyter
§ Use case 2: MKL in Jupyter (on Knights Landing)
§ Use case 3: Knights Landing acceleration and SW infrastructure
§ Call for actions
MOTIVATION
5
“The purpose of computing is insight, not numbers.” Richard Hamming
Need to define a strategy how to get from numbers to insight
6
Motivation
7
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
8
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
9
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
Productivity languages: Python, Julia, Matlab,…
10
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
Productivity languages: Python, Julia, Matlab,…
Frameworks/toolboxes:Caffe, Theano/Keras, Tensor Flow,…
11
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
Productivity languages: Python, Julia, Matlab,…
Frameworks/toolboxes:Caffe, Theano/Keras, Tensor Flow,…
Live books:Jupyter
12
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
Productivity languages: Python, Julia, Matlab,…
Frameworks/toolboxes:Caffe, Theano/Keras, Tensor Flow,…
Live books:Jupyter
13
Numbers to Insight Strategy
Get most insight
Make productive application development
Make productive software development
Use optimized high performance software
Use optimized high performance hardware
C/C++, FORTRAN Libraries/Blocks:Intel© MKL, IPP, TBB, DAAL ,…
Productivity languages: Python, Julia, Matlab,…
Frameworks/toolboxes:Caffe, Theano/Keras, Tensor Flow,…
Live books:Jupyter
API
Ker
nels
Jupyter: Intro in 2 minutesThe Jupyter Notebook is an interactive computing environment that enables users to author notebook documents (web applications) that include:
§ Live code (Julia, Python, R, and more: 50+ languages, language agnostic)
§ Interactive widgets
§ Plots
§ Narrative text
§ Equations
§ Images
§ Videos
14
Jupyter: Intro in 2 minutesThe Jupyter Hub is a multiuser version of the notebook, designed for centralized deployments:
§ Pluggable authentication
§ Collaboration with others trough the Linux
§ Deployments for all users on the centralized servers (on-site or off-site)
§ Container (Docker) friendly – facilitate scaling and process isolation
§ Code meets data – locate notebooks at data location
§ Very popular for many Deep Learning Frameworks
§ Likely, HPC ready
15
USE CASE: JULIA, PYTHON, AND C IN JUPYTER
16
Jupyter is a web
Header and GUI
Text Headers
Code
Outputs
Text
17
High level preparation
Prepare arguments
Do parallel
Put content
18
C call in one line
C call
Conversion to Julia array
C Function
LibraryFormatArguments
Fast, small overhead
19
High level processing in Julia
Define function
Call function
Fast, almost as C (~0.5x)
20
High level plotting/ processing in Python
Get insights on run length
distribution
21
Jupyter+Julia+C+Python Example: Takes off
§ Highly interactive work inside Jupyter to explore model features
§ Julia for glue, productive and fast custom processing
§ C library for low level, heavy duty, fastest computation
§ Python for productive processing (excellent libraries) and plotting
§ Choice of language and mix of language – up to user, agnostic
§ All together – clean, productive work, fast computing, get insights
22
USE CASE: MKL IN JUPYTER (ON KNIGHTS LANDING)
23
MKL Usage Scenarios § MKL (Math Kernel Libraries) – high performance libraries for basic functions
– BLAS, LAPACK, FFT, vector functions (VML), and vector random number generators (VSL)
– C/C++ and FORTRAN APIs
§ (Tentatively) Most productive usage – Intel Python Distribution with Continuum (Anaconda) with hooks to MKL, IPP, DAAL
– Seamlessly installs Intel Python Jupyter toolbox (and use as above)
– Near native (C, Fortran) performance
§ Direct/API call of MKL function from Jupyter can be beneficial
– Getting true native performance
– Controlling more low level details
24
MKL Function Call From Jupyter Example: C API § MKL parallel vector random number generators for Monte Carlo simulators
§ MKL VSL functions has examples – easy wrap up MKL function in C
25
MKL Function Call From Jupyter Example: C API
26
MKL Function Call From Jupyter Example:
27
MKL Function Call From Jupyter Example:
28
MKL Function Call From Jupyter Example: Data
Yield plot Correlation plot
Quick sanity Check - Checked
Final plot - insights
Parameter loop – 10 sims, 1e12 bits each,< 1 hour on KNL
29
USE CASE: KNIGHTS LANDING: ACCELERATION AND SOFTWARE INFRASTRUCTURE
30
31
KNL Xeon PHI CPU Experience (Real, Hands On) § KNL is a general purpose CPU – any SW development you can ran on Xeon/I7 you
can run also on KNL
– Web browsing, Eclipse, Intel Parallel Studio, …, whatever you name it
§ Serial tasks run on KNL slower than on Xeon/I7, but not terribly slow
§ Highly parallel tasks run several times faster than on Xeon/I7 (same optimized SW)
§ Binary compatible with Intel-64
– Download for Xeon/I7, run on KNL
– Compile on/for Xeon/I7, run on KNL
§ For typical OMP tasks (as above, MKL), ~3x acceleration is observed on KNL vs Xeon/I7 Haswell out of the box , more can be done by tuning (AVX512, etc.)
§ Intel full Python distribution, Julia, Jupyter seamlessly installed on KNL J32
KNL Xeon PHI CPU Experience – Simple Tuning
§ Use explicitly MCDRAM memory only for an application
– Use NUMA control: “numactl --membind 1 app_name”
– That gives ~7x acceleration by just 3 more words (for parallel Monte Carlo example as above with MKL, for different tasks results may vary, as shown in a presentation from NERSC)
§ Use SIMD friendly functions from MKL
– With NUMA control for examples as above that provides ~9x acceleration
33
Call for Actions
§ Jupyter is evolving – keep eyes and help its next reincarnation – Jupyter Lab
– More interactive widgets for interactive work and applications
– More integration with OS/Shell
– IDE for code debug – removing needs for other IDE ?
§ A bit better support for Xeon-Phi is needed
§ What else needed from HPC ?
– HPC community feedback is welcome
34
THANK YOU FOR YOUR TIMEOleg Mikulchenko
www.intel.com/hpcdevcon
INTEL® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT