+ All Categories
Home > Documents > Apache Hadoop, current status & future...

Apache Hadoop, current status & future...

Date post: 09-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
24
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Hadoop, current status & future projects…. Spring 2014 Version 1.4 Simon Gregory Director Strategic Alliances & Business Development. Hortonworks EMEA. [email protected]
Transcript
Page 1: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache Hadoop, current status & future projects….

Spring 2014 Version 1.4

Simon Gregory Director Strategic Alliances & Business Development. Hortonworks EMEA. [email protected]

Page 2: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

No apologies… A recap on Apache Hadoop.

No excuses…. Current Hadoop eco-system. Customer Adoption Methods. What’s being worked on. What that means to you.

Page 3: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop: A quick re-cap.

Page 4: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop stores and processes the data you currently do-not or cannot…. Cost Profile. Data Structure.

OLTP,  ERP,  CRM  Systems  

Unstructured  documents,  emails  

Clickstream  

Server  logs  

Sen>ment,  Web  Data  

Sensor.  Machine  Data  

Geoloca>on  

Page 5: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop enables scalable compute & storage with a compelling cost profile….

MPP

SAN

Engineered System

NAS

HADOOP

Cloud Storage

$0 $20,000 $40,000 $60,000 $80,000 $180,000

Fully-loaded Cost Per Raw TB of Data (Min–Max Cost)

Page 6: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop enables scalable compute & storage for all data structures….

       

Determine  list  of  ques/ons  

Design  solu/ons  

Collect  structured  data  

Ask  ques/ons  from  list  

Detect  addi/onal  ques/ons  

Current Reality Apply schema on write

Dependent on IT

Repeatable Process: SQL

Augment w/ Hadoop

Apply schema on read

Support range of access patterns to data stored in HDFS: polymorphic access

HADOOP Iterate

over structure Transform and Analyze

Batch Interactive Real-time

Right Engine, Right Job

In-memory

Page 7: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop alone is not the answer!

Page 8: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

OPERATIONS  TOOLS  

Provision, Manage & Monitor

DEV  &  DATA  TOOLS  

Build & Test

DATA

   SYSTEM  

REPOSITORIES  

SOURC

ES  

RDBMS   EDW   MPP  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

Geoloca>on  Data  

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

APPLICAT

IONS  

Business    Analy/cs  

Custom  Applica/ons  

Packaged  Applica/ons  

OLTP,  ERP,  CRM  Systems  

Unstructured  documents,  emails  

Clickstream  

Server  logs  

Sen>ment,  Web  Data  

Sensor.  Machine  Data  

Geoloca>on  

Page 9: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop: Current Eco-system

Page 10: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved S

olr

Had

oop

&YA

RN

Pig

Tez

Hiv

e &

HC

atal

og

HB

ase

Sqo

op

Ooz

ie

Zoo

keep

er

Mah

out

Am

bari

Sto

rm

Flu

me

Kno

x

Pho

enix

Acc

umul

o

2.2.0

1.1.2

0.11.0

0.11.0

0.12.0

0.12.0

HDP 1.3

May

2013

2.4.0 0.12.1

HDP 2.0

October

2013

HDP 2.1 April

2014

Security Operations Data Access Data Management

0.13.0

0.94.6

0.96.1

0.98.0

0.9.1

0.7.0

0.8.0

0.9.0 4.7.2

1.4.3

1.4.4

1.3.1

1.4.0

1.2.5

1.4.4

1.5.1

3.3.2

4.0.0

3.4.5

0.4.0

0.4.0 4.0.0

1.5.1

Fal

con

0.5.0

Governance & Integration

Apache Hadoop: Driven by the community

Page 11: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Load data and manage

according to policy

Deploy and effectively

manage the platform

Store and process all of your Corporate Data Assets  

Access your data simultaneously in multiple ways (batch, interactive, real-time) Provide layered

approach to security through Authentication, Authorization,

Accounting, and Data Protection

 

DATA    MANAGEMENT  

SECURITY  DATA    ACCESS  GOVERNANCE  &  INTEGRATION   OPERATIONS  

Enable both existing and new application to provide value to the organization

PRESENTATION  &  APPLICATION  

Empower existing operations and security tools to manage Hadoop

ENTERPRISE  MGMT  &  SECURITY  

Provide deployment choice across physical, virtual, cloud

DEPLOYMENT  OPTIONS  

YARN  :  Data  Opera/ng  System  

The days of Hadoop = MR & HDFS are over.

Page 12: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Stable Project Releases

Fixed Issues

Upstream Community Projects

Downstream Enterprise Product Certified at scale using the most advanced Hadoop test bed on the planet •  1000’s of production nodes at Yahoo! •  Over 1500 unit & system tests

Distribute

Integrate & Test

Package & Certify

Release Apache Hadoop

Test & Patch

Design & Develop

Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream

Design & Develop

Apache Hive

Apache HBase

Apache Pig

Apache Falcon

Apache Knox

Apache Storm

Page 13: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

DATA    ACCESS  

   

DATA    MANAGEMENT  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

YARN is the central point of innovation to tie the Hadoop stack together

Architectural consistency •  Platform capabilities leverage

YARN as the common data operating system

•  The common integration point for all data processing engines

–  Community (e.g. Storm, Spark, etc) –  Commercial (e.g. SAS)

Provision,  Manage  &  Monitor  

           

Ambari  Zookeeper  

Scheduling  Oozie  

SECURITY  

Authen/ca/on  Authoriza/on  Accoun/ng  

Data  Protec/on            

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Data  Workflow,  Lifecycle  &  Governance  

           

Falcon  Sqoop  Flume  NFS  

WebHDFS  

GOVERNANCE  &  INTEGRATION  

YARN  :  Data  Opera/ng  System  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  Analy>cs,    ISV  engines  

Batch    

Map  Reduce  

   

Page 14: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  

SECURITY  GOVERNANCE  &  INTEGRATION  

Authen/ca/on  Authoriza/on  Accoun/ng  

Data  Protec/on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Yet Another Resource Negotiator The data operating system of Hadoop that allows multiple processing engines to access data stored in Hadoop with predictable levels of service

DATA    ACCESS  

   YARN  :  Data  Opera/ng  System  

DATA    MANAGEMENT  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  Analy>cs,    ISV  engines  

Batch    

Map  Reduce  

   

Avoid Hadoop Silos & Reduce TCO

•  Single Cluster, Shared Data Set, Multiple Workloads

•  Support a range of access patterns: batch, interactive, online, streaming, real-time and more

Page 15: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Supported with KEY Enterprise Capabilities

DATA    ACCESS  

   

DATA    MANAGEMENT  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Data Center Requirements Apache Hadoop has expanded to deliver the core requirements of any data platform across Governance, Security and Operations

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  Analy>cs,    ISV  engines  

Batch    

Map  Reduce  

   

Data Governance Govern and apply policy to the data lifecycle. Integrate and move data in and out of Hadoop

Data Security Authenticate, authorize and provide accountability of data access. Protect data at rest and in motion

Operations Provision, manage and monitor cluster resources. Maintain and improve performance

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

SECURITY  

Authen/ca/on  Authoriza/on  Accoun/ng  

Data  Protec/on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  

GOVERNANCE  &  INTEGRATION  

YARN  :  Data  Opera/ng  System  

Page 16: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Customer Adoption Methods.

Page 17: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

On-Premise:

Hortonworks Sandbox: http://hortonworks.com/products/hortonworks-sandbox/

Hortonworks HDP: http://hortonworks.com/hdp/downloads/

Getting started..

Cloud:

Microsoft HDInsight: http://azure.microsoft.com/en-us/pricing/free-trial/ Rackspace: http://www.rackspace.com/big-data/

Amazon: http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/

Page 18: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1 2 Evaluation –

Business Value

Awareness, Interest, Education

Evaluation – Technical

Enterprise Deployment

Enterprise Production

Industry Leadership

Point Deployment

Point Production

3 4 Operational Value Strategic Value Data-Driven

Organization

* Timeline varies by company size. Often smaller or focused online businesses achieve milestones at the shorter end of the range.

Typical elapsed time* from start of phase 1 in months:

2-6 9-15 18-24

Potential Value

Page 19: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New Analytic Applications from New Types of Data INDUSTRY USE CASE Sentiment

& Web Clickstream & Behavior

Machine & Sensor Geographic Server Logs Structured &

Unstructured

Financial Services New Account Risk Screens ✔ ✔

Trading Risk ✔

Insurance Underwriting ✔ ✔ ✔

Telecom Call Detail Records (CDR) ✔ ✔

Infrastructure Investment ✔ ✔

Real-time Bandwidth Allocation ✔ ✔ ✔

Retail 360° View of the Customer ✔ ✔ ✔

Localized, Personalized Promotions ✔

Website Optimization ✔

Manufacturing Supply Chain and Logistics ✔

Assembly Line Quality Assurance ✔

Crowd-sourced Quality Assurance ✔

Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔

Monitor Patient Vitals in Real-Time

Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔

Improve Prescription Adherence ✔ ✔ ✔ ✔

Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔

Monitor Rig Safety in Real-Time ✔ ✔ ✔

Government ETL Offload/Federal Budgetary Pressures ✔ ✔

Sentiment Analysis for Government Programs ✔

Page 20: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

What’s being worked on? The future for Hadoop….

Page 21: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New Projects:

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  YARN  :  Data  Opera/ng  System  

DATA    MANAGEMENT  

SECURITY  DATA    ACCESS  GOVERNANCE  &  INTEGRATION  

Authen/ca/on  Authoriza/on  Accoun/ng  

Data  Protec/on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  Analy>cs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce  

   

Deployment  Choice  Linux Windows On-Premise Cloud

http://hortonworks.com/labs/ Storm… Spark (ml) Slider Falcon XASecure Cascading

Page 22: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

What that means to you.

Page 23: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Open Source Drive innovation in the open exclusively via the Apache community-driven open source process.

Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind.

Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills.

Pace of innovation…. Fit for purpose…. Integrated….

Page 24: Apache Hadoop, current status & future projects….2014.adattarhazforum.hu/letoltes/2014dwforum/hortonworks_simon_gregory.pdfLoad data and manage according to policy Deploy and effectively

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Questions….


Recommended