Home > Documents > Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream...

Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream...

Date post: 22-May-2020
Category:
Author: others
View: 16 times
Download: 2 times
Share this document with a friend
Embed Size (px)
of 41 /41
Kafka in Production Andrey Panasyuk, @defascat
Transcript
  • Kafka in Production

    Andrey Panasyuk, @defascat

  • Introduction

    2

  • Remote CallsTypes

    1. Synchronous calls2. Asynchronous calls

    Limitations

    1. Peer-to-Peer2. Retries3. Load balancing4. Durability5. Backpressure

    3

  • Message Queues1. External tool2. Asynchronous communication protocol

    4

  • Lets get to Kafka!!!

    5

  • Apache KafkaApache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log".

    Wikipedia

    6

  • Pub/Sub

    7

    https://s3.amazonaws.com/media-p.slid.es/uploads/contra/images/179840/pubsub.png

  • Concepts. Log

    8

    https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/log.png

  • Concepts. Data Flow

    9

    https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/log_subscription.png

  • Concepts. Distributed Log

    10

    https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/active_and_passive_arch.png

  • Concepts. Partitions

    11

    https://sookocheff.com/post/kafka/kafka-in-a-nutshell/log-anatomy.png

  • Concepts. Partitions to consumers

    12

    https://kafka.apache.org/0102/images/consumer-groups.png

  • Concepts. Architecture

    13

  • I’ve heard in other presentations. Lets get to it!

    14

    https://68.media.tumblr.com/5cee89085f95187a0459a763ef42b122/tumblr_inline_oo0xjzubVY1uyst35_500.gif

  • Kafka. Controller1. One of brokers2. Managing state of partitions3. Managing state of replicas4. Partitions manipulations5. High-availability

    15

  • Kafka + ZooKeeper1. Cluster membership2. Electing leader3. Topic configuration4. Offsets for a Group/Topic/Partition combination

    16

  • Kafka. Guarantees1. Delivery guarantees

    a. At least once (by default)b. At most oncec. Exactly once

    2. Fault-tolerance vs latencya. No ackb. Acks from leaderc. Acks from followers

    3. Message order in a single partition

    17

  • Kafka. Adding a broker 1. Adds a new machine into ISR2. Starts rebalancing partitions (if automatic rebalance enabled)

    a. Too much partitions can cause an issue3. Notifies consumers4. Notifies producers

    18

  • Kafka. Failure Scenarios1. In-Sync-Replicas2. Leader election3. CAP

    a. Partition Toleranceb. Availabilityc. Consistency*

    19

  • I’m a Java Developer. Show me the code!

    20

    https://image.spreadshirtmedia.com/image-server/v1/compositions/1006883748/views/1,width=300,height=300,appearanceId=2,backgroundColor=E8E8E8,version=1473664654/in-code-we-trust-men-s-premium-t-shirt.jpg

  • Kafka. ProducerProperties properties = new Properties();

    properties.setProperty(" bootstrap.servers", brokers);

    properties.setProperty("key.serializer","o.a.k.c.s.StringSerializer");

    properties.setProperty("value.serializer","o.a.k.c.s.StringSerializer");

    KafkaProducer producer = new KafkaProducer(properties);

    KeyedMessage data = new KeyedMessage( "sync", userId, steps);

    producer.send(data);

    21

  • Kafka. Real-world Producers1. Topic name validation2. Adding metrics3. Adding default metadata

    22

  • Kafka. Message availability

    23

    https://cdn2.hubspot.net/hubfs/540072/blog-files/New_Consumer_Figure_2.png

  • Kafka. ConsumerProperties properties = new Properties();

    properties.setProperty(" bootstrap.servers", brokers);

    properties.setProperty("key.deserializer","o.a.k.c.s.StringDeserializer");

    properties.setProperty("value.deserializer","o.a.k.c.s.StringDeserializer");

    properties.setProperty(" group.id", groupId);

    KafkaConsumer consumer = new KafkaConsumer(properties);

    consumer.subscribe(“sync”);

    while(true) {

    consumer.poll(100)

    .forEach(r -> System.out.println(r.key() + ": " + r.value());

    }24

  • Kafka. Real-world Consumers1. Metrics2. Invalid message queue3. Separating message processing in KafkaMessageProcessor4. Different implementations

    a. 1 thread for all partitions vs 1 thread per 1 partitionb. Autocommitc. Poll periodsd. Batch supporte. Rebalancing considerations

    25

  • Kafka. Serializationpublic interface Deserializer {

    public void configure(Map configs, boolean isKey);public T deserialize(String topic, byte[] data);public void close();

    }

    public interface Serializer {public void configure(Map configs, boolean isKey);public byte[] serialize(String topic, T data);public void close();

    }

    26

  • Kafka. Consumer Failure1. Wait for ZooKeeper timeout2. Controller processes event from ZooKeeper3. Controller notifies consumers4. Consumers select new partition consumer

    27

  • Do you really have all this mess working?

    28

    https://qph.ec.quoracdn.net/main-qimg-e6b7a6c0c6809c79b12237ded2f28345-c

  • Kafka. Corporate Challenge Usages1. User Sync Processing2. Analytics

    29

  • Kafka. Our Deployment1. Yahoo kafka-manager

    2. MirrorMaker

    30

  • Kafka. Practices1. Topics manually created on prod, automatically on QA envs2. Do not delete topics (KAFKA-1397, KAFKA-2937, KAFKA-4834, ...)3. IMQ implementation4. Use identical versions on all brokers

    31

    https://issues.apache.org/jira/browse/KAFKA-1397https://issues.apache.org/jira/browse/KAFKA-2937https://issues.apache.org/jira/browse/KAFKA-4834

  • Kafka. Tuning1. 20-100 brokers per cluster; hard limit of 10,000 partitions per cluster (Netflix)2. Increase replica.lag.time.max.ms and replica.lag.max.messages3. Increase num.replica.fetchers4. Reduce retention5. Increase rebalance.max.retries, rebalance.backoff.ms

    32

    http://www.funny-games.biz/images/pictures/1290-low-cost-tuning.jpg

  • Monitoring And Alerting1. Consumer metrics2. Producer metrics3. Kafka Broker metrics4. Zookeeper metrics5. PagerDuty alerts

    33

  • Current State. Message Input Rate

    34

  • Current State. Producer Latency

    35

  • Lets wrap this up!

    36

  • Kafka. Extension Points● Storages

    ○ Amazon S3 (Sink)○ Files (Source)○ Elasticsearch (Sink)○ HDFS (Sink)○ JDBC (Source, Sink)○ C* (Sink)○ PostgreSQL (Sink)○ Oracle/MySQL/MSSQL (Sink)○ Vertica (Source, Sink)○ Ignite (Source, Sink)

    37

    ● Protocols/Queues○ MQTT (Source)○ SQS (Source)○ JMS (Sink)○ RabbitMQ (Source)

    ● Others○ Mixpanel (Sink)

  • Alternatives. ActiveMQ1. Pros

    a. Simplicityb. Way more rich features (standard protocols, TTLs, in-memory)c. DLQd. Extension points

    2. Consa. Delivery guaranteesb. Loosing messages under high loadc. Failure Handling scenariosd. Throughput in transactional mode

    38

    https://upload.wikimedia.org/wikipedia/commons/4/42/Apache-activemq-logo.png

  • Alternatives. RabbitMQ● Pros

    ○ Simpler to start○ More features

    ■ Ability to query/filter■ Federated queues■ Sophisticated routing

    ○ Plugins● Cons

    ○ Scales vertically mostly○ Consumers are mostly online assumption○ Delivery guarantees are less rich

    39

    https://sub.watchmecode.net/wp-content/uploads/2016/09/rabbitmq-logo.jpg

  • Kafka. Strengths and Weaknesses1. Strengths

    a. Horizontal scalabilityb. Rich delivery guarantee modelsc. Disk persistance

    2. Weaknessesa. Need for ZooKeeperb. Lack of any kind of backpressurec. Lack of useful features othe queues havrd. Lack of any kind of DLQe. Limited number of extension pointsf. Complex internal protocolsg. Too smart clients

    40

  • 41

    http://cdn2.arkive.org/media/E8/E867E9DF-746B-49A3-8A65-0CA4033BE700/Presentation.Large/Macaroni-penguin-on-glacier-with-one-wing-raised.jpg

Recommended