+ All Categories
Home > Technology > OOP 2014

OOP 2014

Date post: 27-Jan-2015
Category:
Upload: emil-andreas-siemes
View: 109 times
Download: 1 times
Share this document with a friend
Description:
Data-Lake talk at OOP 2014
Popular Tags:
23
Hortonworks: We Do Hadoop. Our mission is to enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop Emil A. Siemes [email protected] Solution Engineer January 2014
Transcript
Page 1: OOP 2014

Hortonworks: We Do Hadoop.Our mission is to enable your Modern Data Architecture

by Delivering Enterprise Apache Hadoop

Emil A. Siemes

[email protected]

Solution Engineer

January 2014

Page 2: OOP 2014

Our Mission:

Our Commitment

Open LeadershipDrive innovation in the open exclusively via the Apache community-driven open source process

Enterprise RigorEngineer, test and certify Apache Hadoop with the enterprise in mind

Ecosystem EndorsementFocus on deep integration with existing data center technologies and skills

Page 2

Headquarters: Palo Alto, CAEmployees: 300+ and growing

Trusted Partners

Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop

Page 3: OOP 2014

A Traditional Approach Under Pressure

Page 3

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

Business Analytics

Custom Applications

PackagedApplications

Source: IDC

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

Page 4: OOP 2014

Emerging Modern Data Architecture

Page 4

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

Business Analytics

Custom Applications

PackagedApplications

Page 5: OOP 2014

Drivers of Hadoop Adoption

Page 5

From NEW types of Data (or existing types for longer)

New Business Applications

Page 6: OOP 2014

Most Common NEW TYPES OF DATA

1. SentimentUnderstand how your customers feel about your brand and products – right now

2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website

3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines

4. GeographicAnalyze location-based data to manage operations where they occur

5. Server LogsResearch logs to diagnose process failures and prevent security breaches

6. Unstructured (txt, video, pictures, etc..)Understand patterns in files across millions of web pages, emails, and documents

Value

+ Keep existing data longer!

Page 7: OOP 2014

Drivers of Hadoop Adoption

Page 7

A Modern Data ArchitectureComplement your existing data systems: the right workload in the right place

Architectural

New Business Applications

Page 8: OOP 2014

Let’s build a Data Lake…

Instructions on:hadoopwrangler.com

Page 8

Page 9: OOP 2014

Knox – Perimeter Level Security

compute&

storage. . .

. . .

. .compute

&storage

.

.

YARN

Data Lake HDP Grid

AMBARI

HDP Data Lake Solution Architecture

Page 9

HCATALOG (table & user-defined metadata)

Step 2: Model/Apply Metadata

Use Case Type 1: Materialize & Exchange

Opens up Hadoop to many new use cases

Stream Processing, Real-time Search,

MPI

YARNApps

INTERACTIVE

Hive Server(Tez/Stinger)

Query/Analytics/

Reporting Tools

Tableau/Excel

Datameer/Platfora/SAP

Use Case Type 2: Explore/Visualize

FALCON (data pipeline & flow management)

Manage Steps 1-4: Data Lifecycle with Falcon

Ingestion

SQOOP

FLUME

Web HDFS

NFS

SOURCE DATA

ClickStream Data

Sales Transaction

/Data

Product Data

Marketing/Inventory

Social Data

EDW

File

JMS

REST

HTTP

Streaming

Step 1:Extract & Load

Oozie (Batch scheduler)

(data processing)HIVE PIG Mahout

Exchange

HBaseClient

Sqoop/Hive

DownstreamData Sources

OLTPHBase

EDW(Teradata)

StormSAS

Elastic Search

TEZ

Step 3: Transform, Aggregate & Materialize

MR2

Step 4: Schedule and Orchestrate

Page 10: OOP 2014

Store all date in a single place, interact in multiple ways

Hadoop 2: The Introduction of YARN

1st Gen of Hadoop

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HADOOP 2

Single Use SystemBatch Apps

Multi Use Data PlatformBatch, Interactive, Online, Streaming, …

Page 10

Redundant, Reliable Storage(HDFS)

Efficient Cluster Resource Management & Shared Services

(YARN)

Standard QueryProcessing

Hive, Pig

BatchMapReduce

InteractiveTez

Online Data Processing

HBase, Accumulo

Real Time Stream Processing

Stormothers

Page 11: OOP 2014

Let’s start simple…

• A solution unifying all data sources of a mobile App–Allowing analytics over all data in one place

– In real time and long term

• Mobile Apps have multiple channels for data:–Data created on the handset (e.g. geo location)–Data created on servers accessed by the mobile app (e.g. app

data, logs)–Data from backend services (e.g. RDBMS)–Store data (e.g. iTunes Connect, Google Play)–Social data (Twitter, App Reviews, etc.)

Page 11

Page 12: OOP 2014

Why Should We Care?

Page 12

• How much revenue did I made? (Not that easy to answer as one could think)

• Where are my customers now?• Can you fulfill requirements from the business like: ”Tell me when our

customers are in a coffee shop so we can offer them e.g. Wifi”• What are my customers thinking about my app/brand?

• Are the ones complaining really using it (correct)?• How can I support marketing activities?• How can I evaluate local marketing activities?• Does positive/negative sentiment effect my downloads?• Will my servers be able to deal with the load in 3 months• …

Page 13: OOP 2014

Design Goals

• Use as much as we have in our stack as possible• Minimize dependencies on stacks beyond Hadoop

–Still make it useful and complete

• Make it fit into a 8GB MacBook/Laptop• Release early & release often

Page 13

Page 14: OOP 2014

iiCaptain

Page 14

Page 15: OOP 2014

Types Of Data For iiCaptain

Page 15

• Geo location data • Store Data

• iTunes Connect, Google Play, Amazon via AppAnnie

• Twitter• RDBMS (Sqoop)• Logs

Page 16: OOP 2014

iiCaptain’s Data Ocean / Data Lake

Page 16

Page 17: OOP 2014

More Details

Page 17

Page 18: OOP 2014

Analytics

Page 18

Page 19: OOP 2014

SQL Interactive Query & Apache Hive

Page 19

Key ServicesPlatform, operational and data services essential for the enterprise

SkillsLeverage your existing skills: development, analytics, operations

IntegrationInteroperable with existing data center investments

Stinger InitiativeBroad, community based effort to deliver the next generation of Apache Hive

ScaleThe only SQL interface to Hadoop designed for queries that scale from TB to PB

SQLSupport broadest range of SQL semantics for analytic applications against Hadoop

SpeedImprove Hive query performance by 100X to allow for interactive query times (seconds)

SQL

Apache Hive• The defacto standard for Hadoop SQL access

• Used by your current data center partners

• Built for batch AND interactive query

Page 20: OOP 2014

Build Process, Shining With Savanna

Page 20

Page 21: OOP 2014

Roadmap

Page 21

- Servlet Engine in YARN - Project Savanna: Continuous Delivery end-2-end- Sentiment Analysis with Flume/Hive and App Reviews- Knox- Falcon- Phoenix

Page 22: OOP 2014

HDP 2.0: Enterprise Hadoop Platform

Page 22

Hortonworks Data Platform (HDP)

• The ONLY 100% open source and most current platform

• Integrates full range of enterprise-ready services

• Certified and tested at scale

• Engineered for deep ecosystem interoperability

OS/VM Cloud Appliance

CORE SERVICES

CORE

Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

HORTONWORKS DATA PLATFORM (HDP)

OPERATIONAL SERVICES

DATASERVICES

HDFS

SQOOP

FLUME

NFS

LOAD & EXTRACT

WebHDFS

KNOX*

OOZIE

AMBARI

FALCON*

YARN

MAP TEZREDUCE

HIVE &HCATALOG

PIGHBASE

OPERATIONAL SERVICES

DATASERVICES

CORE SERVICES

HORTONWORKS DATA PLATFORM (HDP)

Schedule

Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

Storage

Resource Management

Process

Data Movement

ClusterMgmnt Dataset

Mgmnt Data Access

CORE SERVICES

HORTONWORKS DATA PLATFORM (HDP)

OPERATIONAL SERVICES

DATASERVICES

HDFS

SQOOP

FLUMEAMBARIFALCON

YARN

MAP TEZREDUCE

HIVEPIGHBASE

OOZIE

Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots

LOAD & EXTRACT

WebHDFS

NFS

KNOX*

Page 23: OOP 2014

Hortonworks: The Value of “Open” for You

Page 23

Validate & Try1. Download the

Hortonworks Sandbox

2. Learn Hadoop using the technical tutorials

3. Investigate a business case using the step-by-step business cases scenarios

4. Validate YOUR business case using your data in the sandbox

Connect With the Hadoop CommunityWe employ a large number of Apache project committers & innovators so that you are represented in the open source community

Avoid Vendor Lock-InHortonworks Data Platform remain as close to the open source trunk as possible and is developed 100% in the open so you are never locked in

The Partners you Rely On, Rely On Hortonworks We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments

Certified for the EnterpriseWe engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use

Support from the ExpertsWe provide the highest quality of support for deploying at scale. You are supported by hundreds of years of Hadoop experience

Engage1. Execute a Business Case

Discovery Workshop with our architects

2. Build a business case for Hadoop today


Recommended