Date post: | 21-Jan-2018 |
Category: |
Technology |
Upload: | inside-bigdatacom |
View: | 280 times |
Download: | 0 times |
Intel Confidential CNDA Required
Virtuous Cycle of Compute
AI And The Virtuous Cycle of Compute
Cloud & DATA Center
Things & devices
Dense Compute
Scripting Frameworks
Training
INFERENCING
2 * Source: IDC Worldwide Semiannual Cognitive/Artificial Intelligence Systems Spending Guide, Oct 2016
AI Needs More Compute Faster: 55% revenue CAGR … >$47 billion in 2020 *
Science: Cognition and Neuroscience
Computer Science: Scalable Algorithms for
learning and Decision making
Computer Systems: End-to-end pipeline
data management, model training and deployment
Intel’s AI R&D Stack
3
AI: What makes it hard and fun!
4
Compute architecture needs of AI REDUCING ARITHMETIC PRECISION WHILE PRESERVING ACCURACY: ALL 32 à 16, 8, 4, 2 …
DOMAIN-SPECIFIC ARCHITECTURES à TRADITIONAL, NEUROMORPHIC, QUANTUM
STRONG-SCALING AI TO HPC SCALE: LARGER BATCH SIZES AND HIGHER ORDER METHODS DELIVERING PERFORMANCE-PRODUCTIVITY: SCRIPTING LANGUAGES AND SPATIAL ARCHITECTURES
Productivity and Scaling needs of AI
Deep Learning at 15PF* (with NERSC, Stanford, and Univ of Montreal) Scientific Achievement
§ Signal vs. Background classification for LHC datasets exceeds physics cuts
§ Pattern discovery for Climate data
Methods Achievement
§ Hybrid parameter update strategy
§ Supervised and semi-supervised architectures
CS Achievement
§ IntelCaffe + MLSL optimized on KNL
§ ~2TF peak on single KNL node
§ ~15 PF peak on ~9300 nodes
* “Petascale Deep Learning” Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, and Pradeep Dubey, accepted at Supercomputing 2017
Reduced precision needs for DL Training What motivates us to look beyond FP16 *:
Horizontal bars indicating the dynamic range covered by the 16-bit mantissa
* https://arxiv.org/pdf/1711.02213.pdf
Productivity Challenge Motivation
Challenge#1: Domain experts are not professional software programmers.
Adoption of Python continues to grow among domain experts and developers for its productivity benefits
Challenge#2: Python performance limits migration to production systems
State of the Art algorithms are changing so fast, that 50% of ML
programmers implement algorithms * “from scratch” *
* Source: State of the Developer Nation Q!’2017, http://vmob.me/DE1Q17Mobile
Celeste: 1st Julia application to hit 1PF* (In collaboration with NERSC, UCB, MIT and Julia Computing)
* Cataloging the Visible Universe through Bayesian Inference at Petascale in Julia; https://www.youtube.com/watch?v=uecdcADM3hY&feature=youtu.be
• Scientific Achievement • First catalog with parameter & uncertainty estimates for over 300M
objects • 55 TB SDSS dataset processed in 15 minutes • DESI instrument will use catalog for target selection
• Methods Achievement • Bayesian Inference on world’s largest generative model (in science) • Joint estimation of Billions of parameters
• CS Achievement • Code written in Julia, optimized for execution on KNL • Code scaled on 9300 Cori KNL nodes
9
Architectural Choices Growing …
1D SIMD Such as: AVX-512
For: Short vector processing
2D Tile Such as: Nervana Lake Crest
For: Dense Matrix Algebra Configurable Fabric:
Such as: Stratix-10 FPGA For: Irregular Spatial Parallelism