+ All Categories
Home > Documents > Polyglot Processing - An Introduction 1.0

Polyglot Processing - An Introduction 1.0

Date post: 09-Feb-2017
Category:
Upload: dr-mohan-k-bavirisetty
View: 478 times
Download: 0 times
Share this document with a friend
33
POLYGLOT PROCESSING – AN INTRODUCTION Dr. Mohan K. Bavirisetty Chief Scientist Modern Renaissance
Transcript
Page 1: Polyglot Processing - An Introduction 1.0

POLYGLOT PROCESSING – AN INTRODUCTION

Dr. Mohan K. BavirisettyChief ScientistModern Renaissance

Page 2: Polyglot Processing - An Introduction 1.0

Agenda

1. Big Data Landscape2. Lambda vs. Kappa Architecture3. Spark vs. Storm vs. Flink4. Demo 1 – Apache Spark 5. Demo 2 – Storm, Kafka and Redis 6. Demo 3 – Flink with Data Stream API?7. Summary8. Questions

The purpose of computing is insight not data – Richard Hamming

Page 3: Polyglot Processing - An Introduction 1.0

BIG DATA LANDSCAPE

Page 4: Polyglot Processing - An Introduction 1.0

What is Big Data?

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Source: Gartner Research

Page 5: Polyglot Processing - An Introduction 1.0
Page 6: Polyglot Processing - An Introduction 1.0

What is a Real-time Analytics Platform?

Page 7: Polyglot Processing - An Introduction 1.0

• Batch Operations1

• Micro batch Operations2

• Real-time Streaming3

3 Common Kinds of Workloads

“Evidence-based decision-making (aka Big Data) is not just the latest fad, it'sthe future of how we are going to guide and grow business.” – Kristen Hammond, CTO, Narrative Sciences

Page 8: Polyglot Processing - An Introduction 1.0

8 Requirements of Real-time Computing

Keep Data Moving

Allow SQL Queries

Handle Stream Imperfections

Generate Predictable Outcomes

Integrate Streaming Data and Stored Data

Guarantee Data Safety and Availability

Partition and Scale Applications Automatically

Process and Respond Instantaneously

Page 9: Polyglot Processing - An Introduction 1.0

How do major data engines compare?

Page 10: Polyglot Processing - An Introduction 1.0

Real-time Streaming Architecture

Page 11: Polyglot Processing - An Introduction 1.0

Berkeley Data Analytics Stack

Page 12: Polyglot Processing - An Introduction 1.0

Polyglot …..• One who is versed in many languages …Polyglot

• Different languages, frameworks and services• Example Java with Scala, Clojure inside Trident

Polyglot Programming

• Capacity to store data in multiple formats• Structured, document, Log, GPS

Polyglot Persistence

• Refers to capability to process any kind of data, any kind of workload, any kind of workflow

Polyglot Processing

Page 13: Polyglot Processing - An Introduction 1.0
Page 14: Polyglot Processing - An Introduction 1.0

LAMBDA VS. KAPPA ARCHITECTURES

Page 15: Polyglot Processing - An Introduction 1.0

Lambda Architecture

Page 16: Polyglot Processing - An Introduction 1.0

What is Apache Storm?

Apache Storm is a free and open source distributed real-time computation system it makes it easy to reliably process unbounded streams of data.

Page 17: Polyglot Processing - An Introduction 1.0

Why Apache Storm?

Storm is fast, horizontally scalable, fault-tolerant, easy to setup and operate and programming language agnostic

Page 18: Polyglot Processing - An Introduction 1.0

Apache Storm

Page 19: Polyglot Processing - An Introduction 1.0

Apache Storm can be used to realize an APM Use Case

Page 20: Polyglot Processing - An Introduction 1.0

Apache SparkApache Spark is a fast and general engine for large-scale data processing.

• Spark is fast

• Spark is easy

• Spark is extensible

Page 21: Polyglot Processing - An Introduction 1.0

Lambda Implementation with Spark

Page 22: Polyglot Processing - An Introduction 1.0

Kappa Architecture

Page 23: Polyglot Processing - An Introduction 1.0

Apache Flink

Page 24: Polyglot Processing - An Introduction 1.0

Apache Flink has unified runtime engine

Page 25: Polyglot Processing - An Introduction 1.0
Page 26: Polyglot Processing - An Introduction 1.0

DEMONSTRATION

Page 27: Polyglot Processing - An Introduction 1.0

SUMMARY

Page 28: Polyglot Processing - An Introduction 1.0

Summary• Big Data Challenges are being met with new and

innovative approaches and architectures.• Lambda Architecture is a pragmatic near-term

solution. Fidelity is already implementing it.• Kappa Architecture could turn out to be long-term

elegant solution to Polyglot Processing.• Apache Spark, Strom and Flink have their strengths

and niche areas of applicability.• Apache Samoa, Apache Zappelin and Tacheon add

value further by providing additional capabilities

Page 29: Polyglot Processing - An Introduction 1.0

Maturity

Tim

e

Descriptive

Preventive/Prescriptive

Working Toward Analytics Mastery

Predictive

Page 30: Polyglot Processing - An Introduction 1.0

Next Stage of Data Explosion

Page 31: Polyglot Processing - An Introduction 1.0

QUESTIONS?

We do not learn by inference and deduction and the application of mathematics to philosophy, but by direct intercourse …

- Henry David Thoreau

Page 32: Polyglot Processing - An Introduction 1.0

THANK YOU

Page 33: Polyglot Processing - An Introduction 1.0

Appendix- References and Resources

• 8 Requirements of Real-time Stream Processing http://cs.brown.edu/~ugur/8rulesSigRec.pdf

• Design Patterns for Real-Time Streaming Analytics http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774

• Big Data: Principles and Best Practices of Scalable Real-time Data Systems. http://bit.ly/1LscB7z

• Real-time Stream Processing Next-Step for Apache Flink http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/

• SAMOA – Scalable Advanced Massive Online Analysishttp://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf

• Lambda Architecture http://lambda-architecture.net/• Kappa Architecture http://www.kappa-architecture.com/• Apache Spark http://spark.apache.org/• Apache Storm https://storm.apache.org/• Apache Flink https://flink.apache.org/• Apache SAMOA https://samoa.incubator.apache.org/• Apache Zappelin https://zeppelin.incubator.apache.org/• Tacheon http://tachyon-project.org/


Recommended