+ All Categories
Home > Data & Analytics > Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL

Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL

Date post: 16-Apr-2017
Category:
Upload: memsql
View: 586 times
Download: 4 times
Share this document with a friend
56
Building Real-Time Data Pipelines with Ka(a, Spark, and MemSQL PHX Data Conference 29 Oct 2016 @garyorenstein @memsql (c) Gary Orenstein and MemSQL
Transcript

Building Real-Time Data Pipelineswith Ka(a, Spark, and MemSQL

PHX Data Conference 29 Oct 2016@garyorenstein @memsql

(c) Gary Orenstein and MemSQL

About Me: Gary Orenstein• MemSQL - real--me database

• Fusion-io (SanDisk) - flash memory solu-ons

• Compellent (Dell) - enteprise storage

• experience in networking, caching, file systems

• co-author two O'Reilly Books

• Building Real-Time Data Pipelines (2015)

• The Path to Predic-ve Analy-cs and Machine Learning (2016)

(c) Gary Orenstein and MemSQL

Digital businesses' inexhaus0ble demand for faster performance,

greater scalability and deeper real-4me insight is boos0ng the market for IMC technologies, which is expected

to reach $13 billion by 2020.- Gartner

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

MemSQL BasicsThe Database Pla,orm For Real-Time Analy7cs

• In-Memory (plus disk)

• Rela7onal (SQL)

• Mul7-model (JSON, Geospa7al)

• Distributed (100s of nodes)

Combining streaming, database, and data warehouse workloads(c) Gary Orenstein and MemSQL

2014Building Real-Time Pla0orms with MemSQL

and Apache Spark

Strata New York

(c) Gary Orenstein and MemSQL

Combine the power of a real-2me transforma2on 2er

with the power of a real-.me distributed, persistent, database

making Spark results more accessible to all

(c) Gary Orenstein and MemSQL

2015

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

ResultsFrom 24 hours to instant

1 GB/s and 72 TB/day

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Streamliner

(c) Gary Orenstein and MemSQL

2016

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

The Aha Moment

(c) Gary Orenstein and MemSQL

Everything We Know About Data Movement Is Wrong

(c) Gary Orenstein and MemSQL

1. We're finished with batch and the world is moving to streaming and real-9me

2. Topologies need to change

3. Messaging seman9cs need to improve

(c) Gary Orenstein and MemSQL

Familiar data integra-on pa0erns centered on physical data

movement (bulk/batch data movement, for example) are no longer a sufficient solu-on for

enabling a digital business. > Gartner

(c) Gary Orenstein and MemSQL

I hate batch processing so much that I won't even use the dishwasher.I just wash, dry, and put away real

;me. > Ed Weissman (@edw519)

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Char%ng New Ground With Distributed Data Movement

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Building A Robust Streaming Founda5on

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Renewable Energyand PowerStream Demo

(c) Gary Orenstein and MemSQL

Germany Just Got Almost All ofIts Power From Renewable Energy

May 15, 2016

Bloomberg: h,p://www.bloomberg.com/news/ar5cles/2016-05-16/germany-just-got-almost-all-of-its-power-from-renewable-energy

(c) Gary Orenstein and MemSQL

Investment in renewablesreached $286 billion worldwide

in 2015BBC: h&p://www.bbc.com/news/science-environment-36420750

(c) Gary Orenstein and MemSQL

MemSQL PowerStreamPredic'ng the global health of wind turbines

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Enabling predic.ve analy.cs• Use exis(ng models from SAS

• Create models in Spark MLlib

• Predic(ve scoring as part of the pipeline

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Business Intelligence

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Business Intelligence Details

• Na$vely connect to BI tools like Tableau

• Also Zoomdata, Looker, MicroStrategy

• Business analysts inside your company can use a tool they know and love

(c) Gary Orenstein and MemSQL

DEMO(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

(c) Gary Orenstein and MemSQL

Join our co-founder and CTO, Nikita Shamganov

plus engineering and product experts

(c) Gary Orenstein and MemSQL

Thank You!@garyorenstein @memsql

(c) Gary Orenstein and MemSQL


Recommended