+ All Categories
Home > Technology > Kafka presentation

Kafka presentation

Date post: 05-Apr-2017
Category:
Upload: mohammed-fazuluddin
View: 64 times
Download: 0 times
Share this document with a friend
18
KAFKA KAFKA In and Out @Mohammed Fazuluddin
Transcript
Page 1: Kafka presentation

KAFKAKAFKA In and Out

@Mohammed Fazuluddin

Page 2: Kafka presentation

TOPICS• Kafka overview

• Kafka advantages

• Kafka Real-time use cases.

• Kafka useful links for refence/help

Page 3: Kafka presentation

KAFKA OVERVIEW

Page 4: Kafka presentation

KAFKA OVERVIEW

• Kafka is apaches another great invention, It is an open source messaging system.

• Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis.

• Kafka can message geospatial data from a fleet of long-haul trucks or sensor data from heating and cooling equipment in office buildings.

• Kafka brokers massive message streams for low-latency analysis in Enterprise Apache Hadoop.

• Kafka is often used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication

Page 5: Kafka presentation

KAFKA ADVANTAGES• Apache™ Kafka is

• Fast• Scalable• Durable• Fault-tolerant publish-subscribe messaging system.

• Apache Kafka supports a wide range of use cases as a general-purpose messaging system for scenarios where

• high throughput• Reliable delivery• Horizontal scalability are important.

Page 6: Kafka presentation

KAFKA ADVANTAGES• Common use cases include where Kafka can integrated into system

architecture…

• Stream Processing

• Website Activity Tracking

• Metrics Collection and Monitoring

• Log Aggregation

Page 7: Kafka presentation

KAFKA ADVANTAGES• Kafka supported features and definition:

Feature Description

Scalability Distributed system scales easily with no downtimeDurability Persists messages on disk, and provides intra-cluster replicationReliability Replicates data, supports multiple subscribers, and automatically

balances consumers in case of failure.Performance High throughput for both publishing and subscribing, with disk

structures that provide constant performance even with many terabytes of stored messages.

Page 8: Kafka presentation

KAFKA REAL-TIME USE CASES

• Kafka will be used to capture real-time events.

• Kafka will consume and publish the events in real-time systems with high throughput.

• Kafka will support below mentioned features in real-time• Persisting of the messages• Integrated in distributed system• Integrated in multi-client system

Page 9: Kafka presentation

KAFKA REAL-TIME USE CASES.

• Kafka Producer-Broker-Consumer communication diagram:

Page 10: Kafka presentation

KAFKA REAL-TIME USE CASES.

• Kafka is used in real-time analysis which includes…• Website monitoring • Network monitoring• Fraud detection• Web clicks• Advertising• Internet of Things: sensors

• It is becoming important to process events as they arrive for real-time insights, but high performance at scale is necessary to do this.

• Kafka can be integrated with apache Spark Streaming, MapR-DB, and MapR Streams for fast, event-driven applications.

Page 11: Kafka presentation

KAFKA REAL-TIME USE CASES.

• Example Use Case:• The example use case we will look at here is an application that monitors oil

wells. Sensors in oil rigs generate streaming data, which is processed by Spark and stored in HBase, for use by various analytical and reporting tools.

• We want to store every single event in HBase as it streams in. We also want to filter for, and store alarms. Daily Spark processing will store aggregated summary statistics.

Page 12: Kafka presentation

KAFKA REAL-TIME USE CASES.

•  How do we do this with high performance at scale?• We need to collect the data, process the data, store the data, and finally serve

the data for analysis, machine learning, and dashboards.• Streaming Data Ingestion, Spark Streaming supports data sources such as HDFS

directories, TCP sockets, Kafka, Flume, Twitter, etc., we will use MapR Streams, a new distributed messaging system for streaming event data at scale.

• MapR Streams enables producers and consumers to exchange events in real time via the Apache Kafka 0.9 API. MapR Streams integrates with Spark Streaming via the Kafka direct approach.

• MapR Streams (or Kafka) topics are logical collections of messages. Topics organize events into categories. Topics decouple producers, which are the sources of data, from consumers, which are the applications that process, analyze, and share data.

Page 13: Kafka presentation

KAFKA REAL-TIME USE CASES.

• Multiple publishers and consumers communication diagram:

Page 14: Kafka presentation

KAFKA REAL-TIME USE CASES.

• With HBASE:A table is automatically partitioned across a cluster by key range, and each server is the source for a subset of a table. Grouping the data by key range provides for really fast read and writes by row key.

Page 15: Kafka presentation

KAFKA REAL-TIME USES CASES.

• MapR Streams Producer Steps:• Set producer properties:

• The first step is to set the KafkaProducer configuration properties, which will be used later to instantiate a KafkaProducer for publishing messages to topics.

• Create a KafkaProducer:

• KafkaProducer by providing the set of key-value pair configuration properties which you set up in the first step. You need to specify the type parameters as the type of the key-value of the messages that the producer will send.

Page 16: Kafka presentation

KAFKA REAL-TIME USE CASES.

• Build the ProducerRecord message:

• The ProducerRecord is a key-value pair to be sent to Kafka. It consists of a topic name to which the record is being sent, an optional partition number, and an optional key and a message value.

• The ProducerRecord is also a Java generic class, whose type parameters should match the serialization properties set before.

• Send the message:

• Call the send method on the KafkaProducer passing the ProducerRecord, which will asynchronously send a record to the specified topic.

• The asynchronous send() method adds the record to a buffer of pending records to send, and immediately returns. This allows sending records in parallel without waiting for the responses, and allows the records to be batched for efficiency.

• Finally, call the close method on the producer to release resources. This method blocks until all requests are complete.

Page 17: Kafka presentation

KAFKA USEFUL LINKS FOR REFENCE/HELP

• https://kafka.apache.org/

• https://www.confluent.io/blog/stream-data-platform-1/

• http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

Page 18: Kafka presentation

THANKS

• If you feel it is helpful and worthy to share with other people, please share the same


Recommended