#MongoDBWebinar | @mongodb
Data Streaming with Apache Kafka &
MongoDB
Andrew Morgan –MongoDB Product Marketing
David Tucker –Director, Partner Engineering and Alliances at Confluent
13th September 2016
#MongoDBWebinar | @mongodb
Agenda
Target Audience
Apache Kafka
MongoDB
Integrating MongoDB and Kafka
Kafka – What’s Next
Next Steps
#MongoDBWebinar | @mongodb
What does Kafkado?
Producers
Consumers
Kafka Connect
Kafka Connect
Topic
Your interfaces to the world
Connected to your systems in real time
#MongoDBWebinar | @mongodb
What is Streaming Data
Synchronous Req/Response0 – 100s ms
Near Real Time> 100s ms
Offline Batch> 1 hour
KAFKAStream Data Platform
Search
RDBMS
Apps Monitoring
Real-time AnalyticsNoSQL Stream Processing
HADOOPData Lake
Impala
DWH
Hive
Spark Map-Reduce
#MongoDBWebinar | @mongodb
Confluent’s OfferingsCore
Connect
Streams
Java Client
Kafka
Confluent Platform EnterpriseConfluent Platform
Stream MonitoringMore Clients
Message DeliveryREST Proxy
Stream MonitoringSchema Registry
Connector ManagementPre-Built Connectors
#MongoDBWebinar | @mongodb
Confluent Platform: It’s Kafka ++Feature Benefit Apache Kafka Confluent Platform Confluent Platform
Enterprise
Apache Kafka High throughput, low latency, high availability, secure distributed message system
Kafka Connect Advanced framework for connecting external sources/destinations into Kafka
Java Client Provides easy integration into Java applications
Kafka Streams Simple library that enables streaming application development within the Kafka framework
Additional Clients Supports non-Java clients; C, C++, Python, etc.
REST Proxy Provides universal access to Kafka from any network connected device via HTTP
Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable
Pre-Built Connectors HDFS, JDBC and other connectors fully Certified and fully supported by Confluent
Confluent Control Center Includes Connector Management and Stream Monitoring
Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365
Free Free Subscription
#MongoDBWebinar | @mongodb
Common Kafka Use Cases
Data transport and integration• Log data• Database changes• Sensors and device data• Monitoring streams• Call data records• Stock ticker data
Real-time stream processing• Monitoring• Asynchronous applications• Fraud and security
#MongoDBWebinar | @mongodb
People Using Kafka TodayFinancial Services
Entertainment & Media
Consumer Tech
Travel & Leisure
Enterprise Tech
Telecom Retail
#MongoDBWebinar | @mongodb
Relational
Expressive Query Language& Secondary Indexes
Strong Consistency
Enterprise Management& Integrations
#MongoDBWebinar | @mongodb
NoSQL
Scalability& Performance
Always On,Global Deployments
FlexibilityExpressive Query Language& Secondary Indexes
Strong Consistency
Enterprise Management& Integrations
#MongoDBWebinar | @mongodb
Nexus Architecture
Scalability& Performance
Always On,Global Deployments
FlexibilityExpressive Query Language& Secondary Indexes
Strong Consistency
Enterprise Management& Integrations
#MongoDBWebinar | @mongodb
Where MongoDB Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
Filter
Filter
Merge534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
Take Action
#MongoDBWebinar | @mongodb
Where MongoDB Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
Filter
Filter
Merge534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
StoreResults
Operational Database
#MongoDBWebinar | @mongodb
Where MongoDB Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
Filter
Filter
Merge534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
StoreResults
KeyEvents
Operational Database
#MongoDBWebinar | @mongodb
Where MongoDB Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
Filter
Filter
Merge534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
StoreResults
KeyEvents
Operational Database
#MongoDBWebinar | @mongodb
Where MongoDB Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
Filter
Filter
Merge534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
StoreResults
KeyEvents
Operational Database
Reference Data
#MongoDBWebinar | @mongodb
Where K-Streams Fits
Prod324
123...
Topic A
Prod967
123...
Topic B
534
123...
Topic C
Analyze496
123...
Topic D
TakeAction
StoreResults
KeyEvents
Operational Database
Reference Data
Kafka Streams
#MongoDBWebinar | @mongodb
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Distributed Processing Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Kafka Streams
#MongoDBWebinar | @mongodb
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data LakeConfigure where to land incoming data
Distributed Processing Frameworks
Kafka Streams
#MongoDBWebinar | @mongodb
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Raw data processed to generate analytics models
Distributed Processing Frameworks
Kafka Streams
#MongoDBWebinar | @mongodb
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data LakeMongoDB exposes analytics models to operational apps. Handles real time
updates
Distributed Processing Frameworks
Kafka Streams
#MongoDBWebinar | @mongodb
Mes
sage
Que
ue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed Events
Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn Analysis
Enriched Customer Profiles
Risk Modeling
Predictive Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data LakeCompute new models against
MongoDB & HDFS
Distributed Processing Frameworks
Kafka Streams
#MongoDBWebinar | @mongodb
https://www.mongodb.com/presentations/replacing-traditional-technologies-mongodb-single-platform-all-financial-data-ahl
#MongoDBWebinar | @mongodb
http://www.slideshare.net/danharvey/change-data-capture-with-mongodb-and-kafka
#MongoDBWebinar | @mongodb
Kafka Connectors• Confluent-supported connectors (included in CP)
• Community-written connectors (just a sampling)
JDBC
#MongoDBWebinar | @mongodb
Kafka Futures
• Apache Core• Admin API (KIP-4)• Exactly-once delivery semantics• Time-based topic indexing
• Kafka Streams• Exactly-once processing semantics• Interactive Queries: enable real-time sharing of application state with
other applications• Confluent Platform Enterprise
• Multi-cluster views and alerting added to Control Center
#MongoDBWebinar | @mongodb
MongoDB AtlasDatabase as a service for MongoDB
MongoDB Atlas is…
• Automated: The easiest way to build, launch, and scale apps on MongoDB
• Flexible: The only database as a service with all you need for modern applications
• Secured: Multiple levels of security available to give you peace of mind
• Scalable: Deliver massive scalability with zero downtime as you grow
• Highly available: Your deployments are fault-tolerant and self-healing by default
• High performance: The performance you need for your most demanding workloads
#MongoDBWebinar | @mongodb
MongoDB Atlas Features
• Spin up a cluster in seconds
• Replicated & always-on deployments
• Fully elastic: scale out or up in a few clicks with zero downtime
• Automatic patches & simplified upgrades for the newest MongoDB features
• Authenticated & encrypted
• Continuous backup with point-in-time recovery
• Fine-grained monitoring & custom alerts
Safe & SecureRun for You
• On-demand pricing model; billed by the hour
• Multi-cloud support (AWS available with others coming soon)
• Part of a suite of products & services designed for all phases of your app; migrate easily to different environments (private cloud, on-prem, etc) when needed
No Lock-In
Database as a service for MongoDB
#MongoDBWebinar | @mongodb
MongoDB Enterprise Advanced
• MongoDB Ops Manager orMongoDB Cloud Manager Premium
• MongoDB Compass
• MongoDB Connector for BI
• Encrypted Storage Engine
• LDAP / Kerberos Integration
• DDL & DML Auditing
• FIPS 140-2 Support
SecurityTooling
• 24 x 7 Support
• 1 hr SLA
• Emergency Patches
• Customer Success Program
• On-Demand Training
Support License
• Commercial License
#MongoDBWebinar | @mongodb
Resources
• Data Streaming with Apache Kafka & MongoDB• https://www.mongodb.com/collateral/data-streaming-with-apache-
kafka-and-mongodb• Implementing a Kafka Consumer for MongoDB
• https://www.mongodb.com/blog/post/mongodb-and-data-streaming-implementing-a-mongodb-kafka-consumer
• Tailing the Oplog on a sharded MongoDB Cluster• https://www.mongodb.com/blog/post/tailing-mongodb-oplog-sharded-
clusters
#MongoDBWebinar | @mongodb
Old Billingsgate, London15th November
mongodb.com/europe
Use my discount code for 20% off: andrewmorgan20
#MongoDBWebinar | @mongodb
Document Data Model Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer ID FirstName LastName City
0 John Doe NewYork
1 Mark Smith SanFrancisco
2 Jay Black Newark
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC Customer ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2
#MongoDBWebinar | @mongodb
Document Model Benefits
{customer_id : 1,first_name : "Mark",
last_name : "Smith",city : "San Francisco",phones: [{
number : “1-212-777-1212”,dnc : true,
type : “home”},
number : “1-212-777-1213”, type : “cell”
}] }
Agility and flexibility
Data model supports business change
Rapidly iterate to meet new requirements
Intuitive, natural data representation
Eliminates ORM layer
Developers are more productive
Reduces the need for joins, disk seeks
Programming is more simple
Performance delivered at scale
#MongoDBWebinar | @mongodb
Rich FunctionalityMongoDB
Expressive Queries• Find anyone with phone # “1-212…”• Check if the person with number “555…” is on the “do not
call” list
Geospatial • Find the best offer for the customer at geo coordinates of 42nd
St. and 6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary JSON support
• Add an additional phone number to Mark Smith’s without rewriting the document
• Select just the mobile phone number in the list• Sort on the modified date
{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {
number : “1-212-777-1212”,dnc : true,type : “home”
},{
number : “1-212-777-1213”, type : “cell”
}] }
Left outer join ($lookup)
• Query for all San Francisco residences, lookup their transactions, and sum the amount by person
#MongoDBWebinar | @mongodb
MongoDB Technical CapabilitiesApplication
Driver
Mongos
Primary
Secondary
Secondary
Shard1
Primary
Secondary
Secondary
Shard2
…Primary
Secondary
Secondary
ShardN
db.customer.insert({…})db.customer.find({ name: ”John Smith”})
1.DynamicDocumentSchema{ name: “John Smith”,
date: “2013-08-01”,address: “10 3rd St.”,phone: {
home: 1234567890,mobile: 1234568138 }
}
2.Nativelanguagedrivers
4.Highperformance- Datalocality- Indexes- RAM
3.Highavailability- Replicasets
5.Horizontalscalability- Sharding
… …