Managing Containers with Helix
Kanak Biscuitwala Jason Zhang
Apache Helix Committers @ LinkedIn helix.apache.org
@apachehelix
Intersection of Job Types
OracleDB OracleDB
Intersection of Job Types
OracleDB OracleDB
BackupBackup
Intersection of Job Types
OracleDB OracleDB
BackupBackup
HDFS
ETL ETL
Intersection of Job Types
OracleDB OracleDB
BackupBackup
HDFS
ETL ETL
Long-running and batch jobs running together!
Cloud Deployment
A
B
online
nearline
C batch
A1 A1
A2 A3B1
C1 C2
C3
B2 B3
C2
B4 B5
C2 C4
Applications with diverse requirements running together in a datacenter
Cloud Deployment
A
B
C
A1 A1
A2 A3B1
C1 C2
C3
B2 B3
C2
B4 B5
C2 C4
Applications with diverse requirements running together in a datacenter
DB
Backup
ETL
Processes on Machines
Machine ContainerProcess VM
Processes on Machines
TaskTaskProcess
No Isolation
Machine ContainerProcess VM
Processes on Machines
TaskTaskProcess
128 MB
128 MB
128 MB
Process
Process
Process
No Isolation VM-based Isolation
Machine ContainerProcess VM
Processes on Machines
TaskTaskProcess
256 MB
64 MB
128 MB
128 MB
128 MB
Process
Process
Process Process
Process
No Isolation VM-based Isolation Container-based Isolation
Machine ContainerProcess VM
• Run as individual processes – Poor isolation or poor utilization
• Virtual machines – Better isolation – Xen, Hyper-V, ESX, KVM
• Containers – cgroup – YARN, Mesos – Super lightweight, dynamic based on application
requirements
Processes on Machines
Processes on Machines
Virtualization and containerization significantly improve process isolation and open up possibilities for efficient
utilization of physical resources
Container-Based Solution
Container-Based SolutionSystem Requirements
A
B
C
64 MB 64 MB 64 MB
128 MB 128 MB
256 MB
Container-Based SolutionAllocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
Container-Based SolutionAllocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
A
A
A
B
B
C
Process
Container-Based SolutionAllocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Containerization is powerful!
Machine
Container
A
A
A
B
B
C
Process
Container-Based SolutionAllocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Containerization is powerful!
Machine
Container
A
A
A
B
B
C
Process
But do processes always fit so nicely?
Over-Utilization
256 MB
Container-Based Solution Machine
ContainerProcess
Over-Utilization
256 MBProcess 1
Container-Based Solution Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
256 MBProcess 1
Container-Based Solution Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
Container-Based Solution
384 MB
Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
Container-Based Solution
384 MBProcess 1
Machine
ContainerProcess
Under-Utilization
384 MB
128 MB
Container-Based Solution Machine
ContainerProcess
Under-Utilization
Outcome: Over-provisioned until restart
384 MBProcess 1
128 MB
Container-Based Solution Machine
ContainerProcess
Process 2
Container-Based SolutionFailure
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
A
A
A
B
B
C
Process
Container-Based SolutionFailure
64 MB
64 MB
128 MB
128 MB
Machine
Container
A
A
B
B
Process
Container-Based SolutionFailure
64 MB
64 MB
128 MB
128 MB
Outcome: Launch containers elsewhere
Machine
Container
A
A
B
B
Process
256 MBC64 MBA
What about stateful systems?
Container-Based SolutionFailure
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
SLAVE
SLAVE
MASTER
B
B
C
Process
Container-Based SolutionFailure
64 MB
64 MB
128 MB
128 MB
Without additional information, the master is unavailable until restart
Machine
Container
SLAVE
SLAVE
B
B
Process
ScalingContainer-Based Solution Machine
ContainerProcess
256 MB50% 256 MB50%
ScalingContainer-Based Solution Machine
ContainerProcess
ScalingContainer-Based Solution Machine
ContainerProcess
128 MB33% 128 MB33% 128 MB33%
Outcome: Relaunch with new sharding
Container-Based Solution
Container-Based Solution
Utilization Application requirements define container size
Fault Tolerance New container is started
Scaling Workload is repartitioned and new containers are brought up
Discovery Existence
Container-Based Solution
We need something finer-grained
The container model provides flexibility within machines, but assumes homogeneity of tasks within containers
Task-Based Solution
Task-Based SolutionSystem Requirements
A
B
C
complete in less than 5 hours
always have 2 containers running
response time should be less than 50 ms
Task-Based SolutionAllocation
Machine
Container
A A
B
Task
B
C
C
Over-UtilizationTask-Based Solution Machine
ContainerTask
Over-UtilizationTask-Based Solution
Task 1
Machine
ContainerTask
Over-UtilizationTask-Based Solution
Task 1
Machine
ContainerTask
Over-UtilizationTask-Based Solution
Task 1
Machine
ContainerTask
Task 1
Over-UtilizationTask-Based Solution
Hide the overhead of a container restart
Machine
ContainerTask
Task 1
Under-Utilization
384 MB
128 MB
Task-Based Solution Machine
ContainerTask
Under-Utilization
384 MBTask 1
128 MBTask 2
Task-Based Solution Machine
ContainerTask
Under-Utilization
Optimize container allocations based on usage
384 MBTask 1
Task 2
Task-Based Solution Machine
ContainerTask
Task-Based SolutionFailure
Task 1 Leader
Task 2 Leader
Task 3 Leader
Task 2 Standby
Task 3 Standby
Task 1 Standby
Task 2 Standby
Task 1 Standby
Task 3 Standby
Machine
Container
Task-Based SolutionFailure
Task 1 Leader
Task 2 Leader
Task 2 Standby
Task 3 Standby
Task 1 Standby
Task 3 StandbyTask 3 Leader
Machine
Container
Task-Based SolutionFailure
Some systems cannot wait for new containers to start
Task 1 Leader
Task 2 Leader
Task 2 Standby
Task 3 Standby
Task 1 Standby
Task 3 StandbyTask 3 Leader
Machine
Container
Task-Based SolutionDiscovery
Task 1 Leader
Task 2 Leader
Task 2 Standby
Machine
Container
Task 1:!Leader at N1 Standby at N2
Task 1 Standby
Task 2:!Leader at N2 Standby at N1
N1 N2
Task-Based SolutionDiscovery
Task 1 Leader
Task 2 Leader
Task 2 Standby
Machine
Container
Learn where everything runs, and what state each task is in
Task 1:!Leader at N1 Standby at N2
Task 1 Standby
Task 2:!Leader at N2 Standby at N1
N1 N2
ScalingTask-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
ScalingTask-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
ScalingTask-Based Solution
T4
T5 T6
T1
T2
T3
Machine
ContainerTask
ScalingTask-Based Solution
T4
T5 T6
T1
T2
T3
Machine
ContainerTask
Comparing Solutions
Container Solution Task + Container Solution
Utilization Application requirements define container size
Tasks are distributed as needed to a minimal
container set as per SLA
Fault Tolerance New container is startedExisting task can assume a new state while waiting for
new container
ScalingWorkload is repartitioned and new containers are
brought up
Tasks are moved across containers
Discovery Existence Existence and state
Benefits of a Task-Based SolutionComparing Solutions
Container reuseMinimize overhead of container relaunch
Fine-grained scheduling
Benefits of a Task-Based SolutionComparing Solutions
Container reuseMinimize overhead of container relaunch
Fine-grained scheduling
Task : Container :: Thread : ProcessTask is the right level of abstraction
Working at task granularity is powerful
We need a reactive approach to resource assignment
Comparing Solutions
Working at task granularity is powerful
How can Helix help?
We need a reactive approach to resource assignment
Comparing Solutions
Working at task granularity is powerful
How can Helix help?
We need a reactive approach to resource assignment
Comparing Solutions
YARN/Mesos: containers bring flexibility in a machineHelix: tasks bring flexibility in a container
Task Management with Helix
Application Lifecycle
Capacity Planning
Provisioning
Fault Tolerance
State Management
Allocating physical resources for your load
Deploying and launching tasks
Staying available, ensuring success
Determining what code should be running and where
Controller NODES (Participants)
Spectators
ControllerController
ManageTASKS
Helix OverviewCluster Roles
Helix ControllerHigh-Level Overview
Rebalancer
Task Assignment
Constraints
Nodes
“single master” “no more than 3 tasks
per machine”
Helix ControllerRebalancer
ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);
Based on the current nodes in the cluster and constraints, find an assignment of task to node
Helix ControllerRebalancer
ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState);
Based on the current nodes in the cluster and constraints, find an assignment of task to node
What else do we need?
Helix ControllerWhat is Missing?
Dynamic Container Allocation
Container Isolation
Automated Service Deployment
Resource Utilization Monitoring
Helix ControllerTarget Provider
Based on some constraints, determine how many containers are required in this system
Fixed
CPU
Memory
Bin Packing
We’re working on integrating with monitoring systems in order to query for usage information
Helix ControllerTarget Provider
Based on some constraints, determine how many containers are required in this system
TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants);
class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart;}
Fixed
CPU
Memory
Bin Packing
We’re working on integrating with monitoring systems in order to query for usage information
Helix ControllerAdding a Target Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
Helix ControllerAdding a Target Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
How do we use the target provider response?
Helix ControllerContainer Provider
Given the container requirements, ensure that number of containers are running
YARN
Mesos
Local
Helix ControllerContainer Provider
Given the container requirements, ensure that number of containers are running
ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec);!ListenableFuture<Boolean> deallocateContainer(ContainerId containerId);!ListenableFuture<Boolean> startContainer(ContainerId containerId, Participant participant);!ListenableFuture<Boolean> stopContainer(ContainerId containerId);
YARN
Mesos
Local
Helix ControllerAdding a Container Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
Container Provider
Target Provider + Container Provider = Provisioner
Application Lifecycle
Capacity Planning
Provisioning
Fault Tolerance
State Management
Target Provider
Container Provider
Existing Helix Controller (enhanced by Provisioner)
Existing Helix Controller (enhanced by Provisioner)
With Helix and the Task Abstraction
System Architecture
System Architecture
Resource Provider
System Architecture
submit jobResource ProviderClient
System Architecture
submit jobResource Provider
Controller Container
Provisioner
Rebalancer
Client
App Launcher
System Architecture
submit jobResource Provider
Controller Container
Provisioner
Rebalancer
Client
container request
App Launcher
System Architecture
submit jobResource Provider
Controller Container
Provisioner
Rebalancer
Client
container request
Participant Container
Participant Launcher
Helix Participant
App
App Launcher
System Architecture
submit jobResource Provider
Controller Container
Provisioner
Rebalancer
Client
container request
Participant Container
Participant Launcher
Helix Participant
App
App Launcher
assign tasks
HDFS/Common Area
Helix + YARNYARN Architecture
ClientResource Manager
Application Master Container
Node Manager Node Manager
submit job
node statusnode statuscontainer request
assign work
status
App Package
grab package
HDFS/Common Area
Helix + YARNHelix + YARN Architecture
ClientResource Manager
Application Master Container
Node Manager Node Manager
submit job
node statusnode statuscontainer request
assign tasks
status
Helix Controller
Rebalancer
Helix Participant
App
App Package
grab package
HDFS/Common Area
Scheduler Slave
Helix + MesosMesos Architecture
SchedulerMesos Master
Slave Machine Slave Machine
Mesos SlaveMesos Slave
offer resources
node statusnode status
Mesos Executor
grab executor
Executor Package
offer response
Scheduler SlaveHelix Controller
Helix + MesosHelix + Mesos Architecture
SchedulerMesos Master
Slave Machine Slave Machine
Mesos Slave
Mesos Slave
offer resources
node statusnode status
assign tasks
HDFS/Common Area
Mesos Executor
grab executor
Helix Executor Package
offer response
Helix Participant/App
Example
Distributed Document StoreOverview
OraclePartition 0 Partition 1 Partition 2 Oracle
Partition 0 Partition 1 Partition 2
P1 BackupP2 Backup
HDFS
ETL ETL
Master Slave
OraclePartition 0 Partition 1 Partition 2
P0 Backup
ETL
Distributed Document StoreOverview
OraclePartition 0 Partition 1 Partition 2 Oracle
Partition 0 Partition 1 Partition 2
P1 BackupP2 Backup
HDFS
ETL ETL
Master Slave
P0 Backup
Partition 0 Partition 1 Partition 2
Distributed Document StoreYARN Example
ClientResource Managersubmit job
container request
assign work
status
node status
Application Master
Node Manager
Helix Controller
Rebalancer
Container
Node Manager
node status
Helix Participant
OraclePartition 0 Partition 1
P1 Backup ETL
YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...
Distributed Document Store
YAML SpecificationappConfig: { config: { k1: v1 } }appPackageUri: 'file://path/to/myApp-pkg.tar'appName: myAppservices: [DB, ETL] # the task containersserviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...}servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ...}...
Distributed Document Store
TargetProvider specification
Service/Container Implementation
public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... }! @Override public void onOnline() { ... }! @Override public void onOffline() { ... }}
Distributed Document Store
Task Implementation
public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... }! @Override public ListenableFuture<Status> cancel() { ... }! @Override public ListenableFuture<Status> pause() { ... }! @Override public ListenableFuture<Status> resume() { ... }}
Distributed Document Store
Distributed Document StoreState Model-Style Callbacks
public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... }! public void onBecomeSlaveFromMaster() { ... }! public void onBecomeSlaveFromOffline() { ... }! public void onBecomeOfflineFromSlave() { ... }}
class RoutingLogic { public void write(Request request) { partition = getPartition(request.key); List<Participant> nodes = routingTableProvider.getInstance( partition, “MASTER”); nodes.get(0).write(request); } ! public void read(Request request) { partition = getPartition(request.key); List<Participant> nodes = routingTableProvider.getInstance(partition); random(nodes).read(request); } }
Spectator (for Discovery)Distributed Document Store
Helix at LinkedIn
Helix at LinkedIn
OracleOracleOracleDB
Change Capture
Change Consumers
Index Search Index
User Writes
Data Replicator
Backup/Restore
In Production
ETL
HDFS
Analytics
Helix at LinkedInIn Production
Over 1000 instances covering over 30000 database partitions
Over 1000 instances for change capture consumers
As many as 500 instances in a single Helix cluster
(all numbers are per-datacenter)
Summary
•Container abstraction has become a huge win • With Helix, we can go a step further and make
tasks the unit of work • With the TargetProvider and ContainerProvider
abstractions, any popular provisioner can be plugged in
Questions?
Jason [email protected]
Kanak [email protected]
Website helix.apache.org
Dev Mailing List [email protected]
User Mailing List [email protected]
Twitter @apachehelix?