Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data...

Post on 15-Jul-2015

318 views 0 download

Tags:

transcript

Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven

Alexandre Vasseur, Pivotal @PivotalFrance

© Copyright 2015 Pivotal. All rights reserved.

If you have one thing to do

Store Massive Data Sets

Achieve Continuous Innovation at Scale

Becoming Data Driven with Apps

Data Driven Apps AGILE

DEV & DATA SCIENCE

MODERN, COLLABORATIVE

APP & DEV PLATFORM:

MODERN, CLOUD-ORIENTED

& OPEN

DATA FABRIC: MODERN

CLOUD-ORIENTED & OPEN

© Copyright 2015 Pivotal. All rights reserved.

The Big Data Problem

Fragmentation Contraints Complexity

© Copyright 2015 Pivotal. All rights reserved.

Pivotal + Hortonworks Alliance

•  Started July 2014 around Ambari collaboration •  Announcing Pivotal Big Data Suite

on Hortonworks Data Platform •  Advanced support from world’s leading Hortonworks

support services •  Joint engineering efforts and enhanced Pivotal HD

© Copyright 2015 Pivotal. All rights reserved.

ODP - Standardize Hadoop Ecosystem

•  Deliver ODP Core to build a versionned, packaged, tested set of Hadoop components.

•  Focus on developing a platform, rather than projects •  Initial scope on Apache Hadoop

HDFS / MR / Yarn / Ambari

Remove vendors lock-in

Ecosystem Effect

Shorter Innovation Cycles

http://opendataplatform.org

© Copyright 2015 Pivotal. All rights reserved.

Open Sourced but not just Hadoop

•  Open sourcing all Pivotal Big Data Suite components –  Pivotal GemFire - premium in-memory NoSQL database

–  Pivotal HAWQ - world’s leading SQL compliant enterprise SQL on Hadoop

–  Pivotal Greenplum Database - advanced enterprise MPP analytic database with Hadoop interconnect

– SpringXD - Unified, distributed, and extensible system for data driven application development

© Copyright 2015 Pivotal. All rights reserved.

HAWQ SQL on Hadoop

PROVEN AT SCALE PRODUCTIVE NATIVE on HADOOP / ODP OPEN & EXTENSIBLE

© Copyright 2015 Pivotal. All rights reserved.

HAWQ SQL on Hadoop

10+ years R&D in Massively Parallel SQL SQL engine at peta scale analytics in world’s largest industries Mature cost based query optimizer Full SQL semantics Rich ecosystem of ELT/dataviz/BI & partners PL/*, build in analytics, R native framing All Hadoop formats (gz, Parquet, HAWQ etc) Data node short circuit reads (colocated, not M/R based) Predicate pushdown to Hive, HBase HAWQ PXF: Query federation to NoSQL, DB, etc

© Copyright 2015 Pivotal. All rights reserved.

SpringXD Data from anywhere, to anywhere Real time & batch

Ingest + analytics + jobs orchestration

Developer friendly Built in connectors

With / without Spark

DSL

Your choice of Hadoop Your choice of messaging

Standalone, YARN & outside Hadoop

© Copyright 2015 Pivotal. All rights reserved.

Simplify Data Driven Applications

•  PaaS with NoSQL & Big Data choices built-in •  Emergence of vertical services: Mobile, IoT, …

Data centric runtimes built in Java/PHP/Node.js/Ruby Python R/Shiny Scala SpringXD

Large choice of data services DB, clustered MySQL etc Memcache, Redis etc GemFire, Cassandra etc Hadoop, GreenPlum etc

Can run virtualized inside PaaS Can run multi-tenant-ified alongside PaaS

© Copyright 2015 Pivotal. All rights reserved.

DEMO

PHD (or any ODP Core-based Hadoop Distribution)

HDFS

HAWQ (SQL on Hadoop)

GreenplumDB (Analytics DW)

GemFire (JSON/Object

in memory data grid)

Redis (Key Value Store)

Rab

bitM

Q

SpringXD (Stream Processing/scoring)

Spr

ingX

D

Clo

ud F

ound

ry D

ata

Ser

vice

s

HBase Hive

PXF (Filtered Pushdown)

Direct Store Federated

GPHDFS

Write behind Persistence

Analytic Apps Online Apps

Pivotal Big Data Suite

Spark

© Copyright 2015 Pivotal. All rights reserved.

The New Data Imperatives

Converged Data & Cloud

Open Data-Driven Apps

A NEW PLATFORM FOR A NEW ERA

Meet us at the booth ! Come to do a “HAWQ in 2 min” lab

Win a Solo2 Beats Headphone !