Big Data Ingestion with Kafka -> HDFS using Apache Apex

Post on 21-Apr-2017

72 views 0 download

transcript

Big Data Ingestion with Kafka

Chinmay Kolhatkarchinmay@apache.org

Agenda

● Data Ingestion● Use case: Kafka => HDFS● Brief about Kafka● Steps for development● Let’s code!!!

2

Data Ingestion3

● Reading data in

● Storing in accessible location

● Beginning data pipeline or write path

● From here, it is processed further or read path

Use case: KAFKA => HDFS4

● Reading from Kafka Messaging Queue

● Writing to HDFS

KAFKA HDFS

Use case: Examples5

● Log Aggregation○ Collect logs from various sources○ Streams them as a single topic○ Put all the logs in centralized place i.e. HDFS

● Real time sensor data processing○ Read sensor data from various sources○ Process stream○ Dump results to HDFS

Brief about Kafka6

● Distributed Messaging System

● Fast Reads and Writes

● Can handle large number of clients

● Scalable, fault-tolerant, partitionable

● Persistent messages

Brief about Kafka (contd.)7

● Terminologies○ Topic○ Producer○ Consumer○ Broker

Steps for developing application8

1. Create maven project using apex mvn archetype2. Add required maven dependencies3. Add operators to DAG4. Add stream(s) to DAG5. Set properties in properties.xml6. Compile and run

9

Summary10

● Ease of development using Apex

● Reusable malhar components

● Fault-tolerant, Scalable

● Reduced Time to Production

11

Resources

Apache Apex Meetup

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

o @ApacheApex; Follow - https://twitter.com/apacheapexo @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations• Startup Accelerator Program - Full featured enterprise product

o https://www.datatorrent.com/product/startup-accelerator/

We Are Hiring

Apache Apex Meetup

• jobs@datatorrent.com• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders