Date post: | 14-Jul-2015 |
Category: |
Software |
Upload: | thomas-weise |
View: | 296 times |
Download: | 0 times |
Kafka On YARN (KOYA)
An Open Source Initiative to Integrate Kafka & YARN
Thomas Weise – [email protected]
Siyuan Hua – [email protected]
March 4th, 2015
Apache Kafka
“A high-throughput distributed messaging system.”
“Fast, Scalable, Durable, Distributed”
Kafka is a natural fit to deliver events into a our stream processing platform.
Feed
Kafka feeds Stream Processing
Kafka Cluster
Server-1
P1 P2 P3
Server-2
P1 P2 P3
Server-3
P1 P2 P3
YARN Cluster
Node Manager
DT Container
…
Node Manager
DT AppMaster
DT Container
… …
Resource Manager
…
Problem?
• It is not easy to get started with Kafka
– Initial deployment difficult (build your own tool)
• It is not easy to keep it running
– No central management (status, configuration changes,…)
– No automatic replacement for failed broker
• Operational Inefficiencies
– Resource fragmentation, underutilization
– Common infrastructure not leveraged, extra skill sets
• Adaption Barrier!
Why Kafka on YARN
• YARN enables:
– Horizontal scalability with commodity hardware
– Central resource management with queues, limits and locality preferences
– Framework for achieving fault tolerance and security
• Automate:
– Broker recovery
– Deployment of Kafka clusters
• Integrate:
– User friendly management (alternative to Kafka command line utilities)
YARN Cluster
Kafka on YARN through Slider
Node Manager
…
Node Manager
DT AppMaster
DT Container
… …
Resource Manager
…
Node Manager
…
Node Manager
Slider AM
DT Container
…
Server-1
P1
P2
P3
Server-2
P1
P2
P3
Slider Agent
Slider Agent
Why Slider?
• Automates deployment and configuration of components– Simplify on-demand cluster creation
• Generic AM for long running services– Management of container failures – automates recovery– Sticky allocation of components to hosts across AM restart– Isolation: node labels to pin components to specific set of machines
• Central status– View all servers in one place
• Areas for improvement– Anti-affinity support (YARN limitation)– Agent API documentation– Flexibility in component instance specification
Configuration Example
Demo
Project Status
• Open Source: https://github.com/DataTorrent/koya
• Python Scripts + Configuration
• Works on Hadoop 2.6 through Slider 0.6
• Install: Embedded Slider or Application Package
• First Release by Q2
• Future Enhancements
– Expanded Status Info through Slider AM
– Explore Kafka management UI options
– Support for Disk as a Resource in YARN - YARN-2139
– Better control over server placement (anti-affinity)
– Slider-799
Q & A
Thank You!