Apache Kafka - Free Friday

Apache Kafka

Free Friday

Luiza Souza / Otávio [email protected]

[email protected]

Apache Kafka

● Apache Kafka is a distributed messaging system ○ Provides fast, highly scalable and redundant messaging

through a pub-sub model

● It was built at LinkedIn to be used as central hub for all of the messaging communication between their systems

● Focus on scalability and fault tolerance

Motivation

● Microservices○ "In short, the microservice architectural style is an approach to developing a

single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery."- Martin Fowler

● Monolith First ○ Using microservices as a way to decompose monolitical

infrastructures

● Message Queues○ Asynchronous processing○ Decoupling○ Load balancing○ Scalability

How is it different?

● High throughput○ Millions of events per second per node

● Fault-tolerance guarantees○ Relies on Apache Zookeeper for detection of node failures

and leader election○ Maintains a structure called ISR (In-Sync Replica Set) in order

to be able to tolerate node failures○ (Claims to) Guarantees up to f failures with f+1 replicas

without losing data

● Distributed○ More nodes can be included and the system keeps its

high-performance and fault-tolerance capabilities

● Broker-centric (AMQP)○ AMQP implementations are usually broker-centric○ Focus on delivery guarantees between producers/consumers○ Transient preferred over durable messages ○ Use the broker itself to maintain state of what is consumed

(via message acknowledgements)

● Producer-centric (Kafka)○ Partition a fire hose of event data into durable message

brokers with cursors (pointers) ○ Support to batch consumers that may be offline, or online

consumers that want messages at low latency○ Doesn't have message acknowledgements, it assumes the

consumer tracks what has been consumed so far

Comparison with AMQP

Kafka Terminology

● Producers○ Processes that publishes

msgs to topics● Consumers

○ Processes that readsmsgs from topics

● Topic○ Name of the feed to which

msgs are published● Broker

○ Process running on asingle machine

● Cluster○ Group of brokers working

together

Kafka Terminology

● Partitions○ Subdivision of Topics

■ Scalability■ Load balancing

○ Consumers controltheir own offsets

● Replication○ In-Sync-Replica (ISR) sets

Kafka Terminology

Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each with 3 replicas

Use Cases

● Messaging

● Distributed log / Log aggregation

● Change Data Capture

● Stream Processing / Event Sourcing

Use Cases - Messaging

● Messaging○ Simple Queueing

■ e.g. Queue for sending e-mails○ Tracking user events○ Near real-time metrics

Use Cases - Distributed Log

● Distributed log / Log aggregation○ LinkedIn usage

■ The whole platform is built around a central log■ 13 million messages/sec, 15 gigabytes per sec■ Over 1100 brokers in more than 60 clusters

Use Cases - Change Data Capture

Use Cases - Stream Processing

● Stream Processing / Event Sourcing

LinkedIn's example Netflix's example

DEMO

14

ISSUES15

Issues

● CAP theorem (Consistency, Availability, Partitioning)○ "You can't sacrifice partition tolerance"

● Jepsen tests (@aphyr)○ In order to force failures on Kafka, it needs to shrink ISR

(In-Sync Replica Set) to one node (the master) and then lose the master itself■ It will cause a leader election and a new leader will be

elected● It causes Kafka to lose ~50% of writes done during this

partition time■ Kafka users usually set a replication factor of 2 or 3

replicas for each partition on a given topic

THANK YOU

20

Luiza Souza / Otávio [email protected]

[email protected]

● https://aphyr.com/posts/315-jepsen-rabbitmq● https://aphyr.com/posts/293-jepsen-kafka● https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/

02/project-metamorphosis-with-kafka-spark● https://thoughtworks.jiveon.com/message/1013489● https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at-

kafka-e0c1b90d17d8#.x4f9ezrwn● https://martin.kleppmann.com/2016/01/29/event-sourcing-stre

am-processing-at-ddd-europe.html● http://microservices.io/patterns/microservices.html● http://martinfowler.com/articles/microservices.html● https://engineering.linkedin.com/kafka/running-kafka-scale● https://engineering.linkedin.com/kafka/intra-cluster-replication-

apache-kafka● http://martinfowler.com/bliki/MonolithFirst.html

Links

● https://www.oreilly.com/learning/making-sense-of-stream-processing/page/3/integrating-databases-and-kafka-with-change-data-capture

● http://kafka.apache.org/documentation.html● https://github.com/toddpalino/kafkafromscratch/blob/master/A

pache%20Kafka%20from%20Scratch.pdf● http://www.javaworld.com/article/3060078/big-data/big-data-m

essaging-with-kafka-part-1.html● https://sookocheff.com/post/kafka/kafka-in-a-nutshell/

Links

Use Cases - Change Data Capture

● Log compaction○ Kafka + Kafka Connect

Partitioning

● Custom Partitioner○ Write your own logic

● Default Partitioner○ Manual○ Hashing

■ The most common approach■ Messages with the same key go to the same producer

○ Spraying■ Random partitioning

Date post:	16-Apr-2017
Category:	Software
Upload:	otavio-carvalho
View:	205 times
Download:	1 times

Apache Kafka - Free Friday

Software