Post on 15-Apr-2017
transcript
1Confidential
State of the Streaming Platform 2016What’s new in Apache Kafka and the Confluent Platform
David Tucker, ConfluentDavid Ostrovsky, Couchbase
3Confidential
Who are we ?
David TuckerDirector of Partner Engineering, Confluent
Background :• Architect and designer
• HP Alliances: 4 CEO’s, 3 enterprise hardware platforms• Saw the Hadoop light; led partner engineering at MapR• Better living through data (bigger, faster, better)
• Expertise• Data management solutions• Cloud services and orchestration
David OstrovskySenior Solutions Architect, Couchbase
Background:• Consultant and author
• Hadoop and data processing• Wrote a couple of books about Couchbase• Big data nerd
• Experise• Databases and administration• Streaming data processing
6Confidential
Before: Many Ad Hoc Pipelines
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop App Data Warehouse
Espresso Cassandra Oracle
Databases
Storage
Interfaces
Monitoring App
Databases
Storage
Interfaces
7Confidential
After: Streaming Platform with Kafka
ü Distributed ü Fault Tolerant ü Stores Messages
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Couchbase Oracle
Hadoop App Monitoring App Data Warehouse
Kafka
ü Processes Streams
Kafka StreamsKafka Streams
9Confidential
From Big Data to Stream Data
Stream Data will beBig AND Fast (Kappa)
Volume of Data
Valu
e of
Dat
a
Age of Data
Valu
e of
Dat
a
Streams
Hadoop
DB
Speedtable
Batchtable
Streams DB
Table 1
Table 2
Job 1
Job 2
Big Data wasThe More the Better
Stream Data isThe Faster the Better
Stream Data can beBig or Fast (Lambda)
Apache Kafka is the Enabling Technology of this Transition
10Confidential
Confluent Platform, the Enterprise Streaming Platform
Commercial
Open source
External
Auto-Data Balancing
12Confidential
Apache KafkaTM Connect – Streaming Data Capture
• Fault tolerant• Manage hundreds of data
sources and sinks• Preserves data schema• Part of Apache Kafka project• Integrated within Confluent
Platform’s Control Center
Kafka Brokers
MySQL
Couchbase
JDBC
HDFS
Couchbase
ElasticKafka Connect
ConnectorConnector
ConnectorConnector
Connector Connector
Sources Sinks
13Confidential
Kafka Connect Library of Connectors
Databases Datastore / File Store Analytics Applications / Other
JDBC*Couchbase
Datastax / CassandraGoldenGate
JustOneDynamoDBMongoDB
HbaseInfluxDB
KuduRethinkDB
HDFS*Apache Ignite
FTPSyslog
Hazelcast
Elasticsearch*Veritca
Mixpanel
AttunityAWS / S3
Bloomberg TickerStriimSolr
SyncsortTwitter
* Denotes Connectors developed at Confluent and distributed with the Confluent Platform. Extensive validation and testing has been performed.
14Confidential
Kafka Clients
Ruby Proxy http/REST
Stdin/stdout
Apache Kafka Native Clients
Confluent Native Clients
Community Supported Clients
15Confidential
REST Proxy: Enable Any Application to Access Kafka Data
REST/HTTP
REST Proxy
Schema Registry
Native Kafka Java Applications
Legacy Applications
• Provides a RESTful interface to a Kafka cluster
• Simplifies message creation and consumption
• Simplifies administrative actions
17Confidential
The Challenge of Data Compatibility at Scale
App 1• Many sources without a policy causes mayhem
in a centralized data pipeline
• Ensuring downstream systems can use the
data is key to an operational stream pipeline
• Example: date formats
• Even within a single application, different
formats can be presented
App 2
App 3
18Confidential
App 2
!
Confluent: Schema Registry
App 1
!
• Define the expected fields for each Kafka topic
• Automatically handle schema changes (e.g. new fields)
Kafka Topic
HDFS
Couchbase
Elastic
Example Consumers
• Prevent backwards incompatible changes
• Support multi-datacenter environments
Schema Registry
Serializer
Serializer
20Confidential
Architecture of Kafka Streams, a Part of Apache KafkaTM
Key Benefits• Available as high-level DSL and
low-level API, delivering maximum flexibility for application design
• No additional cluster required• Easy to run as a service• Security and permissions fully
integrated from Kafka
Example Use Cases• Microservices• Continuous queries• Continuous transformations• Event-triggered processes
Topic Topic TopicKafka
StreamsTopic Topic Topic
Kafka Cluster
Producer
Kafka Connect
Consumer Consumer
Kafka Connect
22Confidential
Kafka Streams simplifies your architecture, decouples your teams
App
App
App
1 Capture businessevents in Kafka 2 Must process events with
separate cluster (e.g. Spark) 4 Other apps access latest resultsby querying these DBs3 Must share latest results through
separate systems (e.g. MySQL)
App
App
App
1 Capture businessevents in Kafka 2 Process events with standard
Java apps that use Kafka Streams 3 Now other apps can directlyquery the latest results
Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities
With Kafka Streams: simplified, app-centric architecture, puts app owners in control
KafkaStreams
Your App
Your “Job”
27Confidential
Confluent Control Center: End-to-end Monitoring
See exactly where your messages are going in your Kafka cluster
29Confidential
Control Center: Multi-Datacenter Management & Replication
Manage multi-cluster deployments
• Centralized configuration & monitoring• Replicate clusters or selected topics• Replication of topic configuration• Configurable topic re-names
The Kafka Advantage
• Reliable• Highly available• Scalable• Cloud Ready
30Confidential
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data delivery, high latency, Kafka connector status, and more
• Manage alerts for different users and applications from a web UI
• Manage alerts for different users and applications from a web UI
User authentication
• Control access to Confluent Control Center
• Integrates with existing enterprise authentication systems
35Confidential
Demo Scenario: Streaming Data Pipeline
• Twitter feed with sentiment data
• Twitter Source connector configured to publish data to Kafka topic
• Kafka Streams application augments twitter records with senitment analysis
• K-Streams output saved to Couchbase
• Couchbase Source Connector configured to pull data from Couchbase bucket back to Kafka topic
• 2nd stage Kafka Streams app saves data to another Couchbase bucket and then on to Elasticsearch
36Confidential
Couchbase Connect Demonstration
Kafka Connect
Apache Kafka Brokers
K-Streams app(s)
1
43
2
7
6
5
8