+ All Categories
Home > Software > Data Pipelines Made Simple with Apache Kafka

Data Pipelines Made Simple with Apache Kafka

Date post: 05-Apr-2017
Category:
Upload: confluent
View: 224 times
Download: 1 times
Share this document with a friend
16
1 Data Pipelines Made Simple With Apache Kafka Ewen Cheslack-Postava Engineer, Apache Kafka Committer
Transcript
Page 1: Data Pipelines Made Simple with Apache Kafka

1

Data Pipelines Made Simple With Apache KafkaEwen Cheslack-PostavaEngineer, Apache Kafka Committer

Page 2: Data Pipelines Made Simple with Apache Kafka

2

Attend the whole series!

Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent

Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent

Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product

Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent

https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/

What’s New in Apache Kafka 0.10.2 and Confluent 3.2

Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing

Page 3: Data Pipelines Made Simple with Apache Kafka

3

The Challenge: Streaming Data Pipelines

Page 4: Data Pipelines Made Simple with Apache Kafka

4

Simplifying Streaming Data Pipelines with Apache Kafka

Page 5: Data Pipelines Made Simple with Apache Kafka

5

Kafka Connect

Page 6: Data Pipelines Made Simple with Apache Kafka

6

Streaming ETL

Page 7: Data Pipelines Made Simple with Apache Kafka

7

Single Message Transforms for Kafka Connect

Modify events before storing in Kafka:• Mask sensitive information

• Add identifiers• Tag events

• Store lineage • Remove unnecessary columns

Modify events going out of Kafka:• Route high priority events to

faster data stores• Direct events to different

Elasticsearch indexes• Cast data types to match

destination

• Remove unnecessary columns

Page 8: Data Pipelines Made Simple with Apache Kafka

8

Where Single Message Transforms Fit In

Page 9: Data Pipelines Made Simple with Apache Kafka

9

Built-in Transformations

• InsertField – Add a field using either static data or record metadata• ReplaceField – Filter or rename fields• MaskField – Replace field with valid null value for the type (0, empty string, etc)• ValueToKey – Set the key to one of the value’s fields• HoistField – Wrap the entire event as a single field inside a Struct or a Map• ExtractField – Extract a specific field from Struct and Map and include only this field in results• SetSchemaMetadata – modify the schema name or version• TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful

when using a sink that needs to write to different tables or indexes based on timestamps• RegexpRouter – modify the topic of a record based on original topic, replacement string and a

regular expression

Page 10: Data Pipelines Made Simple with Apache Kafka

10

Configuring Single Message Transforms

name=local-file-sourceconnector.class=FileStreamSourcetasks.max=1file=test.txttopic=connect-testtransforms=MakeMap,InsertSourcetransforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Valuetransforms.MakeMap.field=linetransforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Valuetransforms.InsertSource.static.field=data_sourcetransforms.InsertSource.static.value=test-file-source

Page 11: Data Pipelines Made Simple with Apache Kafka

11

Why only single messages?

• Delivery guarantees!• Always provide at least once semantics• For supported connectors, provide exactly once semantics

• No additional complication: transformations happens inline with import/export

Page 12: Data Pipelines Made Simple with Apache Kafka

12

When should I use each tool?

Kafka Connect & Single Message Transforms• Simple, message at a time• Transformation can be performed inline• Transformation does not interact with

external systems

Kafka Streams• Complex transformations including

• Aggregations• Windowing• Joins

• Transformed data stored back in Kafka, enabling reuse

• Write, deploy, and monitor a Java application

Page 13: Data Pipelines Made Simple with Apache Kafka

13

Conclusion

Single Message Transforms in Kafka Connect• Lightweight transformation of individual messages• Configuration-only data pipelines• Pluggable, with lots of built-in transformations

Page 14: Data Pipelines Made Simple with Apache Kafka

14

Attend the whole series!

Simplify Governance for Streaming Data in Apache KafkaDate: Thursday, April 6, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Gwen Shapira, Product Manager, Confluent

Using Apache Kafka to Analyze Session WindowsDate: Thursday, March 30, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Michael Noll, Product Manager, Confluent

Monitoring and Alerting Apache Kafka with Confluent Control CenterDate: Thursday, March 16, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Nick Dearden, Director, Engineering and Product

Data Pipelines Made Simple with Apache KafkaDate: Thursday, March 23, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Ewen Cheslack-Postava, Engineer, Confluent

https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/

What’s New in Apache Kafka 0.10.2 and Confluent 3.2

Date: Thursday, March 9, 2017Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ETSpeaker: Clarke Patterson, Senior Director, Product Marketing

Page 15: Data Pipelines Made Simple with Apache Kafka

15

Get Started with Apache Kafka Today!

https://www.confluent.io/downloads/

THE place to start with Apache Kafka!

Thoroughly tested and quality assured

More extensible developer experience

Easy upgrade path to Confluent Enterprise

Page 16: Data Pipelines Made Simple with Apache Kafka

16

Discount code: kafcom17  Use the Apache Kafka community discount code to get $50 off  www.kafka-summit.orgKafka Summit New York: May 8Kafka Summit San Francisco: August 28

Presented by


Recommended