Containers and HadoopHadoop virtualization, done right! Dinesh Subhraveti - [email protected]
Altiscale Inc.
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of containers based on syscall interposition — Columbia
“Brief History of Containers”
2001 2002 2003 20052004
First implementation of containers based on syscall interposition — Columbia
First research paper on Linux Containers — OSDI’02
“Brief History of Containers”
2001 2002 2003 20052004
First research paper on Linux Containers — OSDI’02
First container-based distributed checkpointing — HP Labs
First implementation of containers based on syscall interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux Container solution — Meiosys
2004
First research paper on Linux Containers — OSDI’02
First container-based distributed checkpointing — HP Labs
First implementation of containers based on syscall interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux Container solution — Meiosys
2004
First research paper on Linux Containers — OSDI’02
IBM acquires Meiosys — Focus shifted to AIX
First container-based distributed checkpointing — HP Labs
First implementation of containers based on syscall interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux Container solution — Meiosys
2004
First research paper on Linux Containers — OSDI’02
IBM acquires Meiosys — Focus shifted to AIX
First container-based distributed checkpointing — HP Labs
First implementation of containers based on syscall interposition — Columbia
“Brief History of Containers”
2001 2002 2003 2005
Enterprise Linux Container solution — Meiosys
2004
First research paper on Linux Containers — OSDI’02
IBM acquires Meiosys — Focus shifted to AIX
First container-based distributed checkpointing — HP Labs
First implementation of containers based on syscall interposition — Columbia
Most core kernel changes finally made into Linux mainline
Container Renaissance
“Datacenter is the Computer”
“The new computer needs an OS!”
Computer
OS
Mesos KubernetesYARN
Mesos KubernetesYARN
Containers: Enabler of the Datacenter OS
Computer
OS
ProcessesContainers: isolated abstractions
Why not Virtual Machines? Application — Hardware misalignment
Hypervisor
Container Host
Application
Application
Applications have round edges — system call interface
Hypervisors expose square holes — hardware interface
Lightweight abstraction without IO overhead or startup latency
Why not Virtual Machines? Application — Hardware misalignment
Hypervisor
Container Host
Application
Applications have round edges — system call interface
Hypervisors expose square holes — hardware interface
Lightweight abstraction without IO overhead or startup latency
The unwelcome Guest OS
Application
Host
iSCSI, NFS
Image Format Interpreter
Virtual Device
VM Exit (Context Switch)
Guest Driver
Guest File System
Host
Application
Why not Virtual Machines? Layers of Intermediate Software
VMsContainers
Application
High IO overhead due to many intermediate layers
Why not Virtual Machines? The Unwelcome Guest OS
Slow startup time
Guest OS licensing and maintenance burden
Poor scalability
High resource consumption due to duplication
Obfuscated network / storage / compute topologies
Application semantic information is lost
!Hadoop
Resource Manager
Map Reduce
!YARN
Map Reduce Spark Hbase ...
Evolution of Hadoop from Map Reduce to YARN
Isolation is an immediate challenge
!Hadoop
Resource Manager
Map Reduce
!YARN
Map Reduce Spark Hbase ...
Containers on YARN
Containers provide a simple and elegant solution
Container Virtualization
!Node Manager
Customer A Task 1
Customer B Task 1
Containers on YARN Node Manager Spawned Tasks as Containers
Container Virtualization
Customer A Task 2
Customer C Task 1
Tasks representing the same job share the same container
Containers on YARN Advantages
Secure multitenancy
Performance Isolation
Utilization via coscheduling IO and CPU tasks
Consistent cluster environment
Isolation of software dependencies / configuration
Reproducible way to define app environment
Rapid provisioning
❏ Recent addition to the kernel
❏ Superuser in container maps to a regular user on the host
❏ Docker support for UID virtualization
Privilege Isolation through UID namespaces
Host
Container Container root UID 0
Regular user UID 100
UID Virtualization
U
Host root UID 0
References !❏ Blog post describing UID virtualization support in Docker
❏ https://www.altiscale.com/making-docker-work-yarn/ ❏ Apache wiki page tracking work status across Docker and YARN projects
❏ https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers ❏ JIRA tracking Docker integration into YARN
❏ https://issues.apache.org/jira/browse/YARN-1964 ❏ Related Docker tickets
❏ Several tickets linked from: https://github.com/dotcloud/docker/pull/4572
Questions?
Backup
Containers on Hadoop or Hadoop on Containers?
Hadoop on Separate Physical Clusters
Awesomely Secure ! Everybody gets private hardware running private services
Customer 1 Customer 2 Customer 3
Hadoop on Separate Physical Clusters
Customer 1 Customer 2 Customer 3
Cannot scale the business this way!
Poor utilization
Host platform is a huge maintenance burden ❖ Customer 1 needs R ❖ Customer 2 needs Matlab ❖ Customer 3 needs ß∂ø…
Utilization: 6 Spare: 0 Unused: 3
Utilization: 1 Spare: 6 Unused: 2
Utilization: 4 Spare: 3 Unused: 2
Container Clusters to Decouple Host from Customer
Each customer gets a container image ❖ Encapsulates customer specific software and
configuration ❖ Host platform remains lean and simple
Utilization: 6 Spare: 0 Unused: 3
Utilization: 1 Spare: 6 Unused: 2
Utilization: 4 Spare: 3 Unused: 2
Poor utilization
Customer 1 Customer 2 Customer 3
Global Pool of Resources
Global Utilization: 11 Spare: 16 Unused: 0
Container Clusters to Drive Utilization
Each customer gets a container image ❖ Encapsulates customer specific software and
configuration ❖ Host platform remains lean and simple
Densely pack containers together
Global Pool of Resources
Containers with Fine-grain Resources
❖ Container resource levels adjusted dynamically per customer ➢ As dictated by business policy
❖ Fractional resource allocation
Global Pool of Resources
Disaggregated Compute and Storage
DNNM
❖ Add more storage to Customer 1 cluster from a storage rich node ➢ While a compute intensive job from Customer 2 utilizes the available compute capacity on the
same node
Independently scale compute and storage