+ All Categories
Home > Engineering > Confluent kafka meetupseattle jan2017

Confluent kafka meetupseattle jan2017

Date post: 13-Apr-2017
Category:
Upload: nitin-kumar
View: 274 times
Download: 0 times
Share this document with a friend
38
1 Confidential State of the Streaming Platform 2017 What’s new in Apache Kafka and the Confluent Platform David Tucker, Confluent
Transcript
Page 1: Confluent kafka meetupseattle jan2017

11Confidential

State of the Streaming Platform 2017What’s new in Apache Kafka and the Confluent Platform

David Tucker, Confluent

Page 2: Confluent kafka meetupseattle jan2017

44Confidential

The shift to streams

“By 2020, 70% of organizations will adopt data streaming to enable real-time analytics.”1

1: Gartner: Harness Streaming Data for Real-Time Analytics - Nov 20162: Forrester’s 2016 Predictions: Turn Data Into Insight And Action - Nov 2015

“Streaming ingestion and analytics will become a must-have for digital winners.”2

Page 3: Confluent kafka meetupseattle jan2017

55Confidential

Vision of a Streaming Enterprise

Search

NewSQL / NoSQL

RDBMS Monitoring

Document StoreReal-time Analytics Data Warehouse

Mobile Apps

Legacy Apps

Hadoop

Streaming Platform

Page 4: Confluent kafka meetupseattle jan2017

66Confidential

What Can You Do with a Streaming Platform ?

• Publish and Subscribe to streams of data• Analogous to traditional messaging systems

• Store streams of data• Consumers can look back in time

• Process streams of data• Analyze and correlate events in real time

Page 5: Confluent kafka meetupseattle jan2017

77Confidential

The typical integration architecture

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational Metrics

Hadoop Data Warehouse

MySQL Cassandra Oracle

App

Databases

Storage

Interfaces

Monitoring App

Databases

Storage

Interfaces

Page 6: Confluent kafka meetupseattle jan2017

88Confidential

Challenges abound

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational Metrics

Hadoop Data Warehouse

Espresso Cassandra Oracle

App

Databases

Storage

Interfaces

Monitoring App

Databases

Storage

Interfaces

Difficult to handle massive amounts of data

Diverse data sets, arriving at an increasing rate

Many complex data pipelines

Require a separate cluster for real-time

Difficult & time consuming to change

Require mission critical availability into most recent/relevant data

Page 7: Confluent kafka meetupseattle jan2017

99Confidential

Modernized architecture using Apache Kafka

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle

HadoopStreams API

App

Streams API

Monitoring App

Data Warehouse

Apache Kafka

Page 8: Confluent kafka meetupseattle jan2017

1010Confidential

Challenges addressed by a streaming platform

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle

HadoopStreams API

App

Streams API

Monitoring App

Data Warehouse

Apache Kafka

Rewind data stream to re-load into any target system

Scale to meet demands of diverse streams

Pub/sub to data streams

Lightweight, easy to modify with minimal

disruption

Decoupled from upstream apps creating agility Real-time, context specific

data in the moment

Page 9: Confluent kafka meetupseattle jan2017

1111Confidential

Stream Data isThe Faster the Better

Stream Data can beBig or Fast (Lambda)

Stream Data will beBig AND Fast (Kappa)

From Big Data to Stream Data

Apache Kafka is the Enabling Technology of this Transition

Big Data wasThe More the Better

Valu

e of

Dat

a

Volume of Data

Valu

e of

Dat

a

Age of Data

Job 1 Job 2

Streams

Table 1 Table 2

DB

Speed Table Batch Table

DB

Streams Hadoop

Page 10: Confluent kafka meetupseattle jan2017

1212Confidential

Ingest, Process, Load, and Serve Data at a Global Scale

Data Systeam A

…Data System B

Kafka cluster

Applications

Other data stores

Kafka cluster

FIXRaw data / Events

Kafka Streams(Data Enrichment and Transformation)

Kafka Connect(Connectors to Extract and Load data)

ConfluentReplicator

ConfluentReplicator

CustomReplication

CustomReplication

Page 11: Confluent kafka meetupseattle jan2017

1313Confidential

Confluent: Enterprise Streaming Platform based on Apache Kafka™

Confluent Platform

Database Changes

Log Events loT Data Web

Events …

CRM

Data Warehouse

Database

Hadoop

DataIntegration

Monitoring

Analytics

Custom Apps

Transformations

Real-time Applications

Apache Open Source Confluent Open Source Confluent Commercial

Confluent Enterprise

Apache Kafka™

Data Compatibility

Monitoring & Administration

Operations

Clients Connectors

Complete

Open

Trusted

Enterprise Grade

Page 12: Confluent kafka meetupseattle jan2017

1515Confidential

How do I get streams of datainto and out of my apps?

Connect Clients REST

Page 13: Confluent kafka meetupseattle jan2017

1717Confidential

Apache KafkaTM Connect – Streaming Data Capture

JDBC

IRC / Twitter

MySQL

Elastic

NoSQL

HDFS

Kafka Connect API

Kafka Pipeline

Connector

Connector

Connector

Connector

Connector

Connector

Sources Sinks

Fault tolerant

Manage hundreds of data sources and sinks

Preserves data schema

Part of Apache Kafka project

Integrated within Confluent Platform’s Control Center

Page 14: Confluent kafka meetupseattle jan2017

1818Confidential

Apache KafkaTM Connect – Let the framework do the hard work

• Serialization / de-serialization

• Schema Registry integration

• Fault tolerance, automatic fail-over

• Partitioning and scale-out

• … and let the developer focus on domain specific details on copying data

Page 15: Confluent kafka meetupseattle jan2017

1919Confidential

Kafka Connect Architecture: Logical Model

Connect has three main components: Connectors, Tasks, and Workers

Data flowing into / out of the connectors is a stream; each stream is 1 or more partitions. In practice, a stream partition could be a database table, a log file, etc.There may or may not be an exact alignment of streams to Kafka topics.

Page 16: Confluent kafka meetupseattle jan2017

2020Confidential

Kafka Connect Architecture: Execution Model

Host 1 Host 2

Task 1 Task 2 Task 3 Task 4

Worker 1 Worker 2 Worker 3

Page 17: Confluent kafka meetupseattle jan2017

2121Confidential

Kafka Connect API Library of Connectors

* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing has been performed.

Databases

*

Datastore/File Store

*

Analytics

*

Applications / Other

Page 18: Confluent kafka meetupseattle jan2017

2222Confidential

Kafka Clients

Ruby Proxy http/REST

Stdin/stdout

Apache Kafka Native Clients

Confluent Native Clients

Community Supported Clients

Page 19: Confluent kafka meetupseattle jan2017

2323Confidential

REST Proxy: Talking to Legacy Apps and Across Restricted Networks

REST Proxy

Legacy Applications

Native Kafka Applications

Schema Registry

REST / HTTP

Simplifies administrative actions

Simplifies message creation and consumption

Provides a RESTful interface to a Kafka cluster

Page 20: Confluent kafka meetupseattle jan2017

2424Confidential

How do I maintain my data formats and ensure compatibility?

Page 21: Confluent kafka meetupseattle jan2017

2525Confidential

The Challenge of Data Compatibility at Scale

App 1

App 2

App 3

Many sources without a policy causes mayhem in a centralized data pipeline

Ensuring downstream systems can use the data is key to an operational stream pipeline

Example: Date formats

Even within a single application, different formats can be presented

Incompatibly formatted message

Page 22: Confluent kafka meetupseattle jan2017

2626Confidential

Schema Registry

Elastic

NoSQL

HDFS

Example Consumers

SerializerApp 1

SerializerApp 2

!

Kafka Topic!

Schema Registry

Define the expected fields for each Kafka topic

Automatically handle schema changes (e.g. new fields)

Prevent backwards incompatible changes

Supports multi-datacenter environments

Page 23: Confluent kafka meetupseattle jan2017

2727Confidential

How do I build streamprocessing apps?

Page 24: Confluent kafka meetupseattle jan2017

2828Confidential

Architecture of Kafka Streams API, a Part of Apache Kafka

KafkaStreams API

Producer

Kafka Cluster

Topic TopicTopic

Consumer Consumer

Key benefits• No additional cluster• Easy to run as a service• Supports large aggregations and joins• Security and permissions fully

integrated from Kafka

Example Use Cases• Microservices• Continuous queries• Continuous transformations• Event-triggered processes

Page 25: Confluent kafka meetupseattle jan2017

2929Confidential

Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™

Example Use Cases• Microservices

• Large-scale continuous queries and transformations

• Event-triggered processes

• Reactive applications

• Customer 360-degree view, fraud detection, location-based marketing, smart electrical grids, fleet management, …

Key Benefits of Apache Kafka’s Streams API

• Build Apps, Not Clusters: no additional cluster required

• Elastic, highly-performant, distributed, fault-tolerant, secure

• Equally viable for small, medium, and large-scale use cases

• “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud

Your App

KafkaStreams API

Page 26: Confluent kafka meetupseattle jan2017

3030Confidential

Architecture Example

Before: Complexity for development and operations, heavy footprint

1 2 3

Capture businessevents in Kafka

Must process events with separate, special-purpose

clusters

Write resultsback to Kafka

Your Processing Job

Page 27: Confluent kafka meetupseattle jan2017

3131Confidential

Architecture ExampleWith Kafka Streams: App-centric architecture that blends well into your existing infrastructure

1 2 3a

Capture businessevents in Kafka

Process events fast, reliably, securely with standard Java applications

Write resultsback to Kafka

Your App

3b

External apps can directly query the latest results

AppApp

KafkaStreams API

Page 28: Confluent kafka meetupseattle jan2017

3333Confidential

How do I manage and monitormy streaming platform at scale?

Page 29: Confluent kafka meetupseattle jan2017

3434Confidential

Confluent Control Center: End-to-end Monitoring

See exactly where your messages are going in your Kafka cluster

Page 30: Confluent kafka meetupseattle jan2017

3535Confidential

Confluent Control Center: Connector Management

Page 31: Confluent kafka meetupseattle jan2017

3636Confidential

Confluent Control Center: Alerting

Alerts

• Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more

• Manage alerts for different users and applications from a web UI

• Manage alerts for different users and applications from a web UI

User authentication

• Control access to Confluent Control Center

• Integrates with existing enterprise authentication systems

Page 32: Confluent kafka meetupseattle jan2017

3737Confidential

Data Pipeline DemoReal-time data firehose archived to searchable stores

Page 33: Confluent kafka meetupseattle jan2017

3838Confidential

Demo Scenario: Multiple Streaming Data Pipelines

• IRC feed of Wikipedia updates• IRC Source connector publishes real-time stream of Wikipedia updates to Kafka topic• Kafka Streams application parses records and re-writes to new topic• Elasticsearch Sink connector indexes parsed data• Kibana dashboards visualize Wikipedia updates in real time

• Twitter feed augmented with sentiment data• Twitter Source connector configured to publish data to Kafka topic• Kafka Streams application strips extraneous twitter fields and adds sentiment score• Sink connector saves K-Streams output to key-value store (eg Couchbase or DynamoDB)• Key-value queries can track sentiment trends

Page 34: Confluent kafka meetupseattle jan2017

3939Confidential

Wikipedia-to-Elastic Data Pipeline

Page 35: Confluent kafka meetupseattle jan2017

4040Confidential

Wikipedia Transformation

• Raw input records{"createdat":1485386068652,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-pmtpa","hostname":"special.user"},"message":"[[List of Iranian Americans]] https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313 * 01:445:4080:1510:F1A4:7C08:B276:FA8B * (+0) /* Media/Journalism */"}{"createdat":1485386069199,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-pmtpa","hostname":"special.user"},"message":"[[In the Bleak Midwinter]] https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970 * Grover cleveland * (+422) /* Settings */"}• Parsed records{"createdat":1485386068652,"wikipage":"List of Iranian Americans","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313","username":"01:445:4080:1510:F1A4:7C08:B276:FA8B","bytechange":0,"commitmessage":"/* Media/Journalism */"}{"createdat":1485386069199,"wikipage":"In the Bleak Midwinter","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970","username":"Grover cleveland","bytechange":422,"commitmessage":"/* Settings */"}

Page 36: Confluent kafka meetupseattle jan2017

4141Confidential

Twitter Transformation

• Raw input records"CreatedAt": 1479252348000,"Id": 798668350956126200,"Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs

#England… https://t.co/G13NUaZj8W","Source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>","User": { }

128 separate fields

• Filtered records{"sentiment":"Negative","sentimentScore":1,"UserName":"tits","CreatedAt":1485387765000,"Text":"RT @STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo https://t.co/QfCLayqRsHhttps://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}

Page 37: Confluent kafka meetupseattle jan2017

4242Confidential

Kafka Connect Demonstration

Kafka Connect

Apache Kafka Brokers

K-Streams app(s)

1

43

2 5

5

1

2

3 4

Page 38: Confluent kafka meetupseattle jan2017

4444Confidential

Thank You


Recommended