Polyglot Processing - An Introduction 1.0

Post on 09-Feb-2017

478 views 0 download

transcript

POLYGLOT PROCESSING – AN INTRODUCTION

Dr. Mohan K. BavirisettyChief ScientistModern Renaissance

Agenda

1. Big Data Landscape2. Lambda vs. Kappa Architecture3. Spark vs. Storm vs. Flink4. Demo 1 – Apache Spark 5. Demo 2 – Storm, Kafka and Redis 6. Demo 3 – Flink with Data Stream API?7. Summary8. Questions

The purpose of computing is insight not data – Richard Hamming

BIG DATA LANDSCAPE

What is Big Data?

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Source: Gartner Research

What is a Real-time Analytics Platform?

• Batch Operations1

• Micro batch Operations2

• Real-time Streaming3

3 Common Kinds of Workloads

“Evidence-based decision-making (aka Big Data) is not just the latest fad, it'sthe future of how we are going to guide and grow business.” – Kristen Hammond, CTO, Narrative Sciences

8 Requirements of Real-time Computing

Keep Data Moving

Allow SQL Queries

Handle Stream Imperfections

Generate Predictable Outcomes

Integrate Streaming Data and Stored Data

Guarantee Data Safety and Availability

Partition and Scale Applications Automatically

Process and Respond Instantaneously

How do major data engines compare?

Real-time Streaming Architecture

Berkeley Data Analytics Stack

Polyglot …..• One who is versed in many languages …Polyglot

• Different languages, frameworks and services• Example Java with Scala, Clojure inside Trident

Polyglot Programming

• Capacity to store data in multiple formats• Structured, document, Log, GPS

Polyglot Persistence

• Refers to capability to process any kind of data, any kind of workload, any kind of workflow

Polyglot Processing

LAMBDA VS. KAPPA ARCHITECTURES

Lambda Architecture

What is Apache Storm?

Apache Storm is a free and open source distributed real-time computation system it makes it easy to reliably process unbounded streams of data.

Why Apache Storm?

Storm is fast, horizontally scalable, fault-tolerant, easy to setup and operate and programming language agnostic

Apache Storm

Apache Storm can be used to realize an APM Use Case

Apache SparkApache Spark is a fast and general engine for large-scale data processing.

• Spark is fast

• Spark is easy

• Spark is extensible

Lambda Implementation with Spark

Kappa Architecture

Apache Flink

Apache Flink has unified runtime engine

DEMONSTRATION

SUMMARY

Summary• Big Data Challenges are being met with new and

innovative approaches and architectures.• Lambda Architecture is a pragmatic near-term

solution. Fidelity is already implementing it.• Kappa Architecture could turn out to be long-term

elegant solution to Polyglot Processing.• Apache Spark, Strom and Flink have their strengths

and niche areas of applicability.• Apache Samoa, Apache Zappelin and Tacheon add

value further by providing additional capabilities

Maturity

Tim

e

Descriptive

Preventive/Prescriptive

Working Toward Analytics Mastery

Predictive

Next Stage of Data Explosion

QUESTIONS?

We do not learn by inference and deduction and the application of mathematics to philosophy, but by direct intercourse …

- Henry David Thoreau

THANK YOU

Appendix- References and Resources

• 8 Requirements of Real-time Stream Processing http://cs.brown.edu/~ugur/8rulesSigRec.pdf

• Design Patterns for Real-Time Streaming Analytics http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774

• Big Data: Principles and Best Practices of Scalable Real-time Data Systems. http://bit.ly/1LscB7z

• Real-time Stream Processing Next-Step for Apache Flink http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/

• SAMOA – Scalable Advanced Massive Online Analysishttp://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf

• Lambda Architecture http://lambda-architecture.net/• Kappa Architecture http://www.kappa-architecture.com/• Apache Spark http://spark.apache.org/• Apache Storm https://storm.apache.org/• Apache Flink https://flink.apache.org/• Apache SAMOA https://samoa.incubator.apache.org/• Apache Zappelin https://zeppelin.incubator.apache.org/• Tacheon http://tachyon-project.org/