+ All Categories
Home > Software > Apache Kafka - Free Friday

Apache Kafka - Free Friday

Date post: 16-Apr-2017
Category:
Upload: otavio-carvalho
View: 205 times
Download: 1 times
Share this document with a friend
24
Apache Kafka Free Friday Luiza Souza / Otávio Carvalho [email protected] [email protected]
Transcript
Page 1: Apache Kafka - Free Friday

Apache Kafka

Free Friday

Luiza Souza / Otávio [email protected]

[email protected]

Page 2: Apache Kafka - Free Friday

Apache Kafka

● Apache Kafka is a distributed messaging system ○ Provides fast, highly scalable and redundant messaging

through a pub-sub model

● It was built at LinkedIn to be used as central hub for all of the messaging communication between their systems

● Focus on scalability and fault tolerance

Page 3: Apache Kafka - Free Friday

Motivation

● Microservices○ "In short, the microservice architectural style is an approach to developing a

single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery."- Martin Fowler

● Monolith First ○ Using microservices as a way to decompose monolitical

infrastructures

● Message Queues○ Asynchronous processing○ Decoupling○ Load balancing○ Scalability

Page 4: Apache Kafka - Free Friday

How is it different?

● High throughput○ Millions of events per second per node

● Fault-tolerance guarantees○ Relies on Apache Zookeeper for detection of node failures

and leader election○ Maintains a structure called ISR (In-Sync Replica Set) in order

to be able to tolerate node failures○ (Claims to) Guarantees up to f failures with f+1 replicas

without losing data

● Distributed○ More nodes can be included and the system keeps its

high-performance and fault-tolerance capabilities

Page 5: Apache Kafka - Free Friday

● Broker-centric (AMQP)○ AMQP implementations are usually broker-centric○ Focus on delivery guarantees between producers/consumers○ Transient preferred over durable messages ○ Use the broker itself to maintain state of what is consumed

(via message acknowledgements)

● Producer-centric (Kafka)○ Partition a fire hose of event data into durable message

brokers with cursors (pointers) ○ Support to batch consumers that may be offline, or online

consumers that want messages at low latency○ Doesn't have message acknowledgements, it assumes the

consumer tracks what has been consumed so far

Comparison with AMQP

Page 6: Apache Kafka - Free Friday

Kafka Terminology

● Producers○ Processes that publishes

msgs to topics● Consumers

○ Processes that readsmsgs from topics

● Topic○ Name of the feed to which

msgs are published● Broker

○ Process running on asingle machine

● Cluster○ Group of brokers working

together

Page 7: Apache Kafka - Free Friday

Kafka Terminology

● Partitions○ Subdivision of Topics

■ Scalability■ Load balancing

○ Consumers controltheir own offsets

Page 8: Apache Kafka - Free Friday

● Replication○ In-Sync-Replica (ISR) sets

Kafka Terminology

Figure 1. A Kafka cluster with 4 brokers, 1 topic and 2 partitions, each with 3 replicas

Page 9: Apache Kafka - Free Friday

Use Cases

● Messaging

● Distributed log / Log aggregation

● Change Data Capture

● Stream Processing / Event Sourcing

Page 10: Apache Kafka - Free Friday

Use Cases - Messaging

● Messaging○ Simple Queueing

■ e.g. Queue for sending e-mails○ Tracking user events○ Near real-time metrics

Page 11: Apache Kafka - Free Friday

Use Cases - Distributed Log

● Distributed log / Log aggregation○ LinkedIn usage

■ The whole platform is built around a central log■ 13 million messages/sec, 15 gigabytes per sec■ Over 1100 brokers in more than 60 clusters

Page 12: Apache Kafka - Free Friday

Use Cases - Change Data Capture

Page 13: Apache Kafka - Free Friday

Use Cases - Stream Processing

● Stream Processing / Event Sourcing

LinkedIn's example Netflix's example

Page 14: Apache Kafka - Free Friday

DEMO

14

Page 15: Apache Kafka - Free Friday

ISSUES15

Page 16: Apache Kafka - Free Friday
Page 17: Apache Kafka - Free Friday
Page 18: Apache Kafka - Free Friday

Issues

● CAP theorem (Consistency, Availability, Partitioning)○ "You can't sacrifice partition tolerance"

● Jepsen tests (@aphyr)○ In order to force failures on Kafka, it needs to shrink ISR

(In-Sync Replica Set) to one node (the master) and then lose the master itself■ It will cause a leader election and a new leader will be

elected● It causes Kafka to lose ~50% of writes done during this

partition time■ Kafka users usually set a replication factor of 2 or 3

replicas for each partition on a given topic

Page 19: Apache Kafka - Free Friday
Page 20: Apache Kafka - Free Friday

THANK YOU

20

Luiza Souza / Otávio [email protected]

[email protected]

Page 21: Apache Kafka - Free Friday

● https://aphyr.com/posts/315-jepsen-rabbitmq● https://aphyr.com/posts/293-jepsen-kafka● https://thoughtworks.jiveon.com/people/tbartlet/blog/2015/11/

02/project-metamorphosis-with-kafka-spark● https://thoughtworks.jiveon.com/message/1013489● https://medium.com/@ikem/event-sourcing-and-cqrs-a-look-at-

kafka-e0c1b90d17d8#.x4f9ezrwn● https://martin.kleppmann.com/2016/01/29/event-sourcing-stre

am-processing-at-ddd-europe.html● http://microservices.io/patterns/microservices.html● http://martinfowler.com/articles/microservices.html● https://engineering.linkedin.com/kafka/running-kafka-scale● https://engineering.linkedin.com/kafka/intra-cluster-replication-

apache-kafka● http://martinfowler.com/bliki/MonolithFirst.html

Links

Page 22: Apache Kafka - Free Friday

● https://www.oreilly.com/learning/making-sense-of-stream-processing/page/3/integrating-databases-and-kafka-with-change-data-capture

● http://kafka.apache.org/documentation.html● https://github.com/toddpalino/kafkafromscratch/blob/master/A

pache%20Kafka%20from%20Scratch.pdf● http://www.javaworld.com/article/3060078/big-data/big-data-m

essaging-with-kafka-part-1.html● https://sookocheff.com/post/kafka/kafka-in-a-nutshell/

Links

Page 23: Apache Kafka - Free Friday

Use Cases - Change Data Capture

● Log compaction○ Kafka + Kafka Connect

Page 24: Apache Kafka - Free Friday

Partitioning

● Custom Partitioner○ Write your own logic

● Default Partitioner○ Manual○ Hashing

■ The most common approach■ Messages with the same key go to the same producer

○ Spraying■ Random partitioning


Recommended