+ All Categories
Home > Software > Confluent: Streaming operational data with Kafka – Couchbase Connect 2016

Confluent: Streaming operational data with Kafka – Couchbase Connect 2016

Date post: 15-Apr-2017
Category:
Upload: couchbase
View: 916 times
Download: 0 times
Share this document with a friend
29
1 Confidential State of the Streaming Platform 2016 What’s new in Apache Kafka and the Confluent Platform David Tucker, Confluent David Ostrovsky, Couchbase
Transcript

1Confidential

State of the Streaming Platform 2016What’s new in Apache Kafka and the Confluent Platform

David Tucker, ConfluentDavid Ostrovsky, Couchbase

3Confidential

Who are we ?

David TuckerDirector of Partner Engineering, Confluent

Background :• Architect and designer

• HP Alliances: 4 CEO’s, 3 enterprise hardware platforms• Saw the Hadoop light; led partner engineering at MapR• Better living through data (bigger, faster, better)

• Expertise• Data management solutions• Cloud services and orchestration

David OstrovskySenior Solutions Architect, Couchbase

Background:• Consultant and author

• Hadoop and data processing• Wrote a couple of books about Couchbase• Big data nerd

• Experise• Databases and administration• Streaming data processing

4Confidential

What does Kafka do?

5Confidential

Kafka is much more thana pub-sub messaging system

6Confidential

Before: Many Ad Hoc Pipelines

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational Metrics

Hadoop App Data Warehouse

Espresso Cassandra Oracle

Databases

Storage

Interfaces

Monitoring App

Databases

Storage

Interfaces

7Confidential

After: Streaming Platform with Kafka

ü Distributed ü Fault Tolerant ü Stores Messages

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational MetricsEspresso Couchbase Oracle

Hadoop App Monitoring App Data Warehouse

Kafka

ü Processes Streams

Kafka StreamsKafka Streams

8Confidential

Apache Kafka:A distributed streaming platform

9Confidential

From Big Data to Stream Data

Stream Data will beBig AND Fast (Kappa)

Volume of Data

Valu

e of

Dat

a

Age of Data

Valu

e of

Dat

a

Streams

Hadoop

DB

Speedtable

Batchtable

Streams DB

Table 1

Table 2

Job 1

Job 2

Big Data wasThe More the Better

Stream Data isThe Faster the Better

Stream Data can beBig or Fast (Lambda)

Apache Kafka is the Enabling Technology of this Transition

10Confidential

Confluent Platform, the Enterprise Streaming Platform

Commercial

Open source

External

Auto-Data Balancing

11Confidential

How do I get streams of datainto and out of my apps?

Connect Clients REST

12Confidential

Apache KafkaTM Connect – Streaming Data Capture

• Fault tolerant• Manage hundreds of data

sources and sinks• Preserves data schema• Part of Apache Kafka project• Integrated within Confluent

Platform’s Control Center

Kafka Brokers

MySQL

Couchbase

JDBC

HDFS

Couchbase

ElasticKafka Connect

ConnectorConnector

ConnectorConnector

Connector Connector

Sources Sinks

13Confidential

Kafka Connect Library of Connectors

Databases Datastore / File Store Analytics Applications / Other

JDBC*Couchbase

Datastax / CassandraGoldenGate

JustOneDynamoDBMongoDB

HbaseInfluxDB

KuduRethinkDB

HDFS*Apache Ignite

FTPSyslog

Hazelcast

Elasticsearch*Veritca

Mixpanel

AttunityAWS / S3

Bloomberg TickerStriimSolr

SyncsortTwitter

* Denotes Connectors developed at Confluent and distributed with the Confluent Platform. Extensive validation and testing has been performed.

14Confidential

Kafka Clients

Ruby Proxy http/REST

Stdin/stdout

Apache Kafka Native Clients

Confluent Native Clients

Community Supported Clients

15Confidential

REST Proxy: Enable Any Application to Access Kafka Data

REST/HTTP

REST Proxy

Schema Registry

Native Kafka Java Applications

Legacy Applications

• Provides a RESTful interface to a Kafka cluster

• Simplifies message creation and consumption

• Simplifies administrative actions

16Confidential

How do I maintain my data formats

and ensure compatibility?

17Confidential

The Challenge of Data Compatibility at Scale

App 1• Many sources without a policy causes mayhem

in a centralized data pipeline

• Ensuring downstream systems can use the

data is key to an operational stream pipeline

• Example: date formats

• Even within a single application, different

formats can be presented

App 2

App 3

18Confidential

App 2

!

Confluent: Schema Registry

App 1

!

• Define the expected fields for each Kafka topic

• Automatically handle schema changes (e.g. new fields)

Kafka Topic

HDFS

Couchbase

Elastic

Example Consumers

• Prevent backwards incompatible changes

• Support multi-datacenter environments

Schema Registry

Serializer

Serializer

19Confidential

How do I build stream processing apps?

20Confidential

Architecture of Kafka Streams, a Part of Apache KafkaTM

Key Benefits• Available as high-level DSL and

low-level API, delivering maximum flexibility for application design

• No additional cluster required• Easy to run as a service• Security and permissions fully

integrated from Kafka

Example Use Cases• Microservices• Continuous queries• Continuous transformations• Event-triggered processes

Topic Topic TopicKafka

StreamsTopic Topic Topic

Kafka Cluster

Producer

Kafka Connect

Consumer Consumer

Kafka Connect

22Confidential

Kafka Streams simplifies your architecture, decouples your teams

App

App

App

1 Capture businessevents in Kafka 2 Must process events with

separate cluster (e.g. Spark) 4 Other apps access latest resultsby querying these DBs3 Must share latest results through

separate systems (e.g. MySQL)

App

App

App

1 Capture businessevents in Kafka 2 Process events with standard

Java apps that use Kafka Streams 3 Now other apps can directlyquery the latest results

Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities

With Kafka Streams: simplified, app-centric architecture, puts app owners in control

KafkaStreams

Your App

Your “Job”

26Confidential

How do I manage and monitor my streaming

platform at scale?

27Confidential

Confluent Control Center: End-to-end Monitoring

See exactly where your messages are going in your Kafka cluster

28Confidential

Confluent Control Center: Connector Management

29Confidential

Control Center: Multi-Datacenter Management & Replication

Manage multi-cluster deployments

• Centralized configuration & monitoring• Replicate clusters or selected topics• Replication of topic configuration• Configurable topic re-names

The Kafka Advantage

• Reliable• Highly available• Scalable• Cloud Ready

30Confidential

Confluent Control Center: Alerting

Alerts

• Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more

• Manage alerts for different users and applications from a web UI

• Manage alerts for different users and applications from a web UI

User authentication

• Control access to Confluent Control Center

• Integrates with existing enterprise authentication systems

34Confidential

Demo

35Confidential

Demo Scenario: Streaming Data Pipeline

• Twitter feed with sentiment data

• Twitter Source connector configured to publish data to Kafka topic

• Kafka Streams application augments twitter records with senitment analysis

• K-Streams output saved to Couchbase

• Couchbase Source Connector configured to pull data from Couchbase bucket back to Kafka topic

• 2nd stage Kafka Streams app saves data to another Couchbase bucket and then on to Elasticsearch

36Confidential

Couchbase Connect Demonstration

Kafka Connect

Apache Kafka Brokers

K-Streams app(s)

1

43

2

7

6

5

8

38Confidential

Thank You


Recommended