Date post: | 21-Feb-2017 |
Category: |
Technology |
Upload: | willem-meints |
View: | 105 times |
Download: | 6 times |
Big data streamingWillem Meints
Microservices & analytics
Event bus
Micro services• Multiple smaller services that scale independedely• Each service his own data store• Data flows between services through the event bus
Rapids
Rivers
Lakes
Data analytics challenges with microservices• A complete picture is there, but spread over a vast landscape• Most data doesn’t come in a database• Data changes rapidly
Exploring some scenarios
Scenario 1: Get a annual sales report• The goal is to get a complete picture of the situation• Data based on business events
OrdersInvoices
Event bus
Data analytics
Data Lake
OrdersInvoices
Event bus
Data analytics
Data Lake
Scenario 2: Detect anomalies• The goal is to detect anomalies on the website and prevent abuse• Machine learning needed to detect the anomalies• Data based on the data lake
Click stream collector
Event bus
Data analytics
Data Lake
Click stream collector
Event bus
Data analytics
Data Lake
Model
Analytics tools
vs
Event bus Data processing tool
Distributed database
Alerting
Dashboarding
Event bus Data processing tool
Distributed database
Alerting
Dashboarding
Flow control logic
Cluster Manager
The Azure based solution
Azure Event Hub HDInsight
Azure Data Lake
Alerting
Dashboarding
Azure App Services
Cluster Manager
DemoA short introduction into Apache Spark
Spark SQL Spark Streaming Machine Learning GraphX
Apache Spark Core
Resilient Distributed Data SetsResilient Distributed Dataset
Partition
Record Record
Partition
Record Record
Spark Streaming
Spark Engine
Stream Batches Processed data
Streams with Spark
Spark Streaming
Spark Engine
Stream Batches Processed data
Streams with Spark
Lists of RDDs
DemoDeploying Spark to Azure using HDInsight
Azure Event Hubs• Capable of streaming large
volumes of data
• SDK available in many languages
• Ruby• Python• Java/Scala• C#• Apache Spark
Hoe werkt een Azure Event Hub?
Publisher
Publisher
Publisher
Event Hub
Partition
Partition
Partition
Consumer group
Consumer group
Consumer
Consumer
DemoUsing Azure Event Hub with Spark
Tips for going in production• When using streams, always have n+1 worker nodes• More partitions = more speed• Longer intervals is slower, but sometimes better
Thanks!Willem MeintsTechnical Evangelist/Microsoft MVP@willem_meints