© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
We’ll get started soon…
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Combine SAS High-Performance Capabilities with Hadoop YARN
We do Hadoop.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
Arun Murthy, Founder and Architect Hortonworks @acmurthy
Paul Kent, Vice President Big Data SAS @hornpolish
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
• Introduction to YARN • SAS Workloads on the Cluster • SAS Workloads: Resource Settings • SAS and YARN • YARN Futures • Next Steps
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
The 1st Generation of Hadoop: Batch
HADOOP 1.0 Built for Web-Scale Batch Apps
Single App BATCH
HDFS
Single App INTERACTIVE
Single App BATCH
HDFS
• All other usage patterns must leverage that same infrastructure
• Forces the creation of silos for managing mixed workloads
Single App BATCH
HDFS
Single App ONLINE
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop MapReduce Classic
JobTracker
§ Manages cluster resources and job scheduling
TaskTracker
§ Per-node agent
§ Manage tasks
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
MapReduce Classic: Limitations Scalability § Maximum Cluster size – 4,000 nodes § Maximum concurrent tasks – 40,000 § Coarse synchronization in JobTracker
Availability § Failure kills all queued and running jobs Hard partition of resources into map and reduce slots § Low resource utilization
Lacks support for alternate paradigms and services § Iterative applications implemented using MapReduce are 10x slower
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Vision: Hadoop as Next-Gen Platform
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
MapReduce (Cluster Resource Management & Data Processing)
Script
Pig
SQL
Hive
Others
Storm, Solr, etc.
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Real-time
HBase
Script
Pig
SQL
Hive
Engines
HBase Accumulo, Storm,
Solr, Spark.
Others
ISV Engines
Tez Tez
Others
Engines
Tez
Hadoop 1 • Silos & Largely batch • Single Processing engine
Hadoop 2 w/ • Multiple Engines, Single Data Set • Batch, Interactive & Real-Time
Java
Cascading
Tez
° °
° °
° °
°
°
N
HDFS (Hadoop Distributed File System)
Tez
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN: Taking Hadoop Beyond Batch
Applica,ons Run Na,vely IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search) (Weave…)
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN
Hortonworks Data Platform
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Others
Engines
Tez
Java
Cascading
Tez
° °
° °
° °
HBase
NoSQL
Storm
Stream
Slider Slider
Accumulo
NoSQL
Others
Engines
Slider Slider
° ° ° ° °
° ° ° ° °
° ° ° ° °
°
°
°
Spark
In-Memory
°
°
°
°
°
°
PaaS
Kubernetes
LASR HPA
°
°
N
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
Batch
MR
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
5 5 Key Benefits of YARN
1. Scale
2. New Programming Models & Services
3. Improved cluster utilization
4. Agility
5. Beyond Java
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Concepts
Application § Application is a temporal job or a service submitted YARN § Examples
– Map Reduce Job (job) – Hbase Cluster (service)
Container § Basic unit of allocation § Fine-grained resource allocation across multiple resource types (memory, cpu, disk,
network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU
§ Replaces the fixed map/reduce slots
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Design Centre
Split up the two major functions of JobTracker § Cluster resource management § Application life-cycle management
MapReduce becomes user-land library
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager NodeManager NodeManager NodeManager
Container 1.1
Container 2.4
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.2
Container 1.3
AM 1
Container 2.2
Container 2.1
Container 2.3
AM2
YARN Architecture - Walkthrough
Client2
ResourceManager
Scheduler
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Multi-Tenancy with YARN Economics as queue-capacity § Heirarchical Queues
SLAs § Preemption
Resource Isolation § Linux: cgroups § MS Windows: Job Control § Roadmap: Virtualization (Xen, KVM)
Administration § Queue ACLs § Run-time re-configuration for queues § Charge-back
ResourceManager
Scheduler
root
Adhoc 10%
DW 70%
Mrkting 20%
Dev 10%
Reserved 20%
Prod 70%
Prod 80%
Dev 20%
P0 70%
P1 30%
Capacity Scheduler
Hierarchical Queues
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Applications
Data processing applications and services § Services - Slider § Real-time event processing – Storm, S4, other commercial platforms
§ Tez – Generic framework to run a complex DAG § MPI: OpenMPI, MPICH2 § Master-Worker § Machine Learning: Spark § Graph processing: Giraph § Enabled by allowing the use of paradigm-specific application master
Run all on the same Hadoop cluster!
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SHARE!
Customers are:
wrapping up POCs
building Bigger Clusters
assembling their Data { Lake, Reservoir }
want their software to SHARE the cluster
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS Workloads on the Cluster
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS Workloads on the Cluster - Video
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
Some Requests are for a significant slice of the cluster
Reservation will be ALL DAY, ALL WEEK, ALL MONTH?
Memory typically fixed (15% of cluster)
CPU floor, would like the spare capacity when available Some Requests are more short term
Memory can be estimated
Duration can be capped
CPU floor, would like spare capacity
SAS Workloads on the Cluster
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
SAS Workloads on the Cluster
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
How much should you reserve?
not a perfect science yet Long Running?
LASR server by percent of total memory More like a batch request?
HPA procedure by anecdotal experience
SAS Workloads – Resource Settings
Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .
if [ "$USER" = "lasradm" ]; then # Custom settings for running under the lasradm account. export TKMPI_ULIMIT="-v 50000000” export TKMPI_MEMSIZE=50000 export TKMPI_CGROUP="cgexec -g cpu:75” fi # if [ "$TKMPI_APPNAME" = "lasr" ]; then # Custom settings for a lasr process running under any account. # export TKMPI_ULIMIT="-v 50000000" # export TKMPI_MEMSIZE=50000 # export TKMPI_CGROUP="cgexec -g cpu:75"
SAS Workloads – Resource Settings
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN: Taking Hadoop Beyond Batch
Applica,ons Run Na,vely IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
ONLINE (HBase)
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager NodeManager
Container 1.1
NodeManager NodeManager
AM 1 startContainer!
ResourceManager
Scheduler
NodeManager NodeManager NodeManager NodeManager
NodeManager
NodeManager NodeManager
NodeManager
1
allocate!
2 container!
3
YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager NodeManager
ServiceX
NodeManager NodeManager
AM 1 delegateContainer!
ResourceManager
Scheduler
NodeManager NodeManager NodeManager NodeManager
NodeManager
NodeManager NodeManager
NodeManager
1
allocate!
2 container!
3
4
YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager NodeManager
ServiceX
NodeManager NodeManager
AM 1
ResourceManager
Scheduler
NodeManager NodeManager NodeManager NodeManager
NodeManager
NodeManager NodeManager
NodeManager
5
YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
NodeManager NodeManager
NodeManager NodeManager
AM 1
ResourceManager
Scheduler
NodeManager NodeManager NodeManager NodeManager
NodeManager
NodeManager NodeManager
NodeManager
6 ServiceX
YARN – Delegated Container Model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
PaaS - Kubernetes-on-YARN YARN as the default enterprise-class scheduler and resource manager for Kubernetes and OpenShift 3
q First class support for containerization and mainstream PaaS
q Updated go language bindings for YARN
q Uses container delegation model
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Labels – Constraint Specifications
NodeManager NodeManager NodeManager NodeManager w/ GPU
map 1.1
NodeManager NodeManager NodeManager w/ GPU NodeManager w/ GPU
NodeManager NodeManager NodeManager NodeManager w/ GPU
map1.2
reduce1.1
MR AM 1
DL1.1
DL1.2
DL1.3
DL-‐AM
ResourceManager
Scheduler
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN
Hortonworks Data Platform
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Others
Engines
Tez
Java
Cascading
Tez
° °
° °
° °
HBase
NoSQL
Storm
Stream
Slider Slider
Accumulo
NoSQL
Others
Engines
Slider Slider
° ° ° ° °
° ° ° ° °
° ° ° ° °
°
°
°
Spark
In-Memory
°
°
°
°
°
°
PaaS
Kubernetes
LASR HPA
°
°
N
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
Batch
MR
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about SAS & Hortonworks http://hortonworks.com/partner/SAS/
Contact us: [email protected]