Date post: | 16-Apr-2017 |
Category: |
Software |
Author: | otavio-carvalho |
View: | 195 times |
Download: | 1 times |
Apache Kafka
Free Friday
Luiza Souza / Otvio [email protected]
Apache Kafka
Apache Kafka is a distributed messaging system Provides fast, highly scalable and redundant messaging
through a pub-sub model
It was built at LinkedIn to be used as central hub for all of the messaging communication between their systems
Focus on scalability and fault tolerance
Motivation
Microservices "In short, the microservice architectural style is an approach to developing a
single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery."- Martin Fowler
Monolith First Using microservices as a way to decompose monolitical
infrastructures
Message Queues Asynchronous processing Decoupling Load balancing Scalability
How is it different?
High throughput Millions of events per second per node
Fault-tolerance guarantees Relies on Apache Zookeeper for detection of node failures
and leader election Maintains a structure called ISR (In-Sync Replica Set) in order
to be able to tolerate node failures (Claims to) Guarantees up to f failures with f+1 replicas
without losing data
Distributed More nodes can be included and the system keeps its
high-performance and fault-tolerance capabilities
Broker-centric (AMQP) AMQP implementations are usually broker-centric Focus on delivery guarantees between producers/consumers Transient preferred over durable messages Use the broker itself to maintain state of what is consumed
(via message acknowledgements)
Producer-centric (Kafka) Partition a fire hose of event data into durable message
brokers with cursors (pointers) Support to batch consumers that may be offline, or online
consumers that want messages at low latency Doesn't have message acknowledgements, it assumes the
consumer tracks what has been consumed so far
Comparison with AMQP
Kafka Terminology
Producers Processes that publishes
msgs to topics Consumers
Processes that readsmsgs from topics
Topic Name of the feed to which
msgs are published Broker
Process running on asingle machine
Cluster Group of brokers working
together
Kafka Terminology
Partitions Subdivision of Topics
Scalability Load balancing
Consumers controltheir own offsets
Replication In-Sync-Replica (ISR) sets
Kafka Terminology
Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each with 3 replicas
Use Cases
Messaging
Distributed log / Log aggregation
Change Data Capture
Stream Processing / Event Sourcing
Use Cases - Messaging
Messaging Simple Queueing
e.g. Queue for sending e-mails Tracking user events Near real-time metrics
Use Cases - Distributed Log
Distributed log / Log aggregation LinkedIn usage
The whole platform is built around a central log 13 million messages/sec, 15 gigabytes per sec Over 1100 brokers in more than 60 clusters
Use Cases - Change Data Capture
Use Cases - Stream Processing
Stream Processing / Event Sourcing
LinkedIn's example Netflix's example
DEMO
14
ISSUES15
Issues
CAP theorem (Consistency, Availability, Partitioning) "You can't sacrifice partition tolerance"
Jepsen tests (@aphyr) In order to force failures on Kafka, it needs to shrink ISR
(In-Sync Replica Set) to one node (the master) and then lose the master itself It will cause a leader election and a new leader will be
elected It causes Kafka to lose ~50% of writes done during this
partition time Kafka users usually set a replication factor of 2 or 3
replicas for each partition on a given topic
THANK YOU
20
Luiza Souza / Otvio [email protected]
https://aphyr.com/posts/315-jepsen-rabbitmq https://aphyr.com/posts/293-jepsen-kafka https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/
02/project-metamorphosis-with-kafka-spark https://thoughtworks.jiveon.com/message/1013489 https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at-
kafka-e0c1b90d17d8#.x4f9ezrwn https://martin.kleppmann.com/2016/01/29/event-sourcing-stre
am-processing-at-ddd-europe.html http://microservices.io/patterns/microservices.html http://martinfowler.com/articles/microservices.html https://engineering.linkedin.com/kafka/running-kafka-scale https://engineering.linkedin.com/kafka/intra-cluster-replication-
apache-kafka http://martinfowler.com/bliki/MonolithFirst.html
Links
https://www.oreilly.com/learning/making-sense-of-stream-processing/page/3/integrating-databases-and-kafka-with-change-data-capture
http://kafka.apache.org/documentation.html https://github.com/toddpalino/kafkafromscratch/blob/master/A
pache%20Kafka%20from%20Scratch.pdf http://www.javaworld.com/article/3060078/big-data/big-data-m
essaging-with-kafka-part-1.html https://sookocheff.com/post/kafka/kafka-in-a-nutshell/
Links
Use Cases - Change Data Capture
Log compaction Kafka + Kafka Connect
Partitioning
Custom Partitioner Write your own logic
Default Partitioner Manual Hashing
The most common approach Messages with the same key go to the same producer
Spraying Random partitioning