AI & The Virtuous Cycle of Compute

Intel Confidential CNDA Required

Virtuous Cycle of Compute

AI And The Virtuous Cycle of Compute

Cloud & DATA Center

Things & devices

Dense Compute

Scripting Frameworks

Training

INFERENCING

2 * Source: IDC Worldwide Semiannual Cognitive/Artificial Intelligence Systems Spending Guide, Oct 2016

AI Needs More Compute Faster: 55% revenue CAGR … >$47 billion in 2020 *

Science: Cognition and Neuroscience

Computer Science: Scalable Algorithms for

learning and Decision making

Computer Systems: End-to-end pipeline

data management, model training and deployment

Intel’s AI R&D Stack

3

AI: What makes it hard and fun!

4

Compute architecture needs of AI REDUCING ARITHMETIC PRECISION WHILE PRESERVING ACCURACY: ALL 32 à 16, 8, 4, 2 …

DOMAIN-SPECIFIC ARCHITECTURES à TRADITIONAL, NEUROMORPHIC, QUANTUM

STRONG-SCALING AI TO HPC SCALE: LARGER BATCH SIZES AND HIGHER ORDER METHODS DELIVERING PERFORMANCE-PRODUCTIVITY: SCRIPTING LANGUAGES AND SPATIAL ARCHITECTURES

Productivity and Scaling needs of AI

Deep Learning at 15PF* (with NERSC, Stanford, and Univ of Montreal) Scientific Achievement

§  Signal vs. Background classification for LHC datasets exceeds physics cuts

§  Pattern discovery for Climate data

Methods Achievement

§  Hybrid parameter update strategy

§  Supervised and semi-supervised architectures

CS Achievement

§  IntelCaffe + MLSL optimized on KNL

§  ~2TF peak on single KNL node

§  ~15 PF peak on ~9300 nodes

* “Petascale Deep Learning” Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, and Pradeep Dubey, accepted at Supercomputing 2017

Reduced precision needs for DL Training What motivates us to look beyond FP16 *:

Horizontal bars indicating the dynamic range covered by the 16-bit mantissa

* https://arxiv.org/pdf/1711.02213.pdf

Productivity Challenge Motivation

Challenge#1: Domain experts are not professional software programmers.

Adoption of Python continues to grow among domain experts and developers for its productivity benefits

Challenge#2: Python performance limits migration to production systems

State of the Art algorithms are changing so fast, that 50% of ML

programmers implement algorithms * “from scratch” *

* Source: State of the Developer Nation Q!’2017, http://vmob.me/DE1Q17Mobile

Celeste: 1st Julia application to hit 1PF* (In collaboration with NERSC, UCB, MIT and Julia Computing)

* Cataloging the Visible Universe through Bayesian Inference at Petascale in Julia; https://www.youtube.com/watch?v=uecdcADM3hY&feature=youtu.be

•  Scientific Achievement •  First catalog with parameter & uncertainty estimates for over 300M

objects •  55 TB SDSS dataset processed in 15 minutes •  DESI instrument will use catalog for target selection

• Methods Achievement •  Bayesian Inference on world’s largest generative model (in science) •  Joint estimation of Billions of parameters

• CS Achievement •  Code written in Julia, optimized for execution on KNL •  Code scaled on 9300 Cori KNL nodes

9

Architectural Choices Growing …

1D SIMD Such as: AVX-512

For: Short vector processing

2D Tile Such as: Nervana Lake Crest

For: Dense Matrix Algebra Configurable Fabric:

Such as: Stratix-10 FPGA For: Irregular Spatial Parallelism

Date post:	21-Jan-2018
Category:	Technology
Upload:	inside-bigdatacom
View:	280 times
Download:	0 times

AI & The Virtuous Cycle of Compute

Technology