Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx

© Hortonworks Inc. 2012

Hortonworks & Systems Integrators Mitch Ferguson VP, Business Development Rikin Shah Dir, Field Engineering

September 5, 2012

Page 1


Big data changes the game

Megabytes

Gigabytes

Terabytes

Petabytes

Purchase detail Purchase record Payment record

ERP

CRM

WEB

BIG DATA

Offer details

Support Contacts

Customer Touches

Segmentation

Web logs

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS Sentiment

External Demographics

HD Video, Audio, Images

Speech to Text

Product/Service Logs

Social Interactions & Feeds

Business Data Feeds

User Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

Increasing Data Variety and Complexity

Transactions + Interactions + Observations = BIG DATA


Hortonworks Snapshot

The industry leading and only 100% open source Apache Hadoop distribution

Most experienced open source leadership team –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss) –  Mitch Ferguson: VP BD (SringSource, VMWare) –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) –  Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)

Business model focused on customer success: Hadoop support, services & training – Subscription support for Hortonworks Data Platform – Training business: Private and public classes

available for developers & administrators

•  Headquarters Sunnyvale, CA

•  100+ Employees

•  Formed with core Apache Hadoop engineering team from Yahoo!

•  40+ engineers and architects including 25+ Hadoop committers


Hortonworks Business Strategy

Enable the next gen data management platform

• Accelerate the adoption of Apache Hadoop

• Create a vibrant eco-systems – ISVs, IHV, Systems Integrators

• Provide world-class enterprise Support & Training


We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop.

Hortonworks Vision & Role

Be diligent stewards of the open source core 1

Be tireless innovators beyond the core 2

Provide robust data platform services & open APIs 3

Enable the ecosystem at each layer of the stack 4

Make the platform enterprise-ready & easy to use 5


Enabling Hadoop as Enterprise Big Data Platform

DEVELOPER Data Platform Services & Open APIs

Hortonworks Data Platform

Applications, Business Tools, Development Tools, Open APIs and access Data Movement & Integration, Data Management Systems, Systems Management

Installation & Configuration, Administration, Monitoring, High Availability, Replication, Multi-tenancy, ..

Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs


Hortonworks Partner Eco-System


Hortonworks & SIs Our business models are 100% Complementary • Systems Integrators are a corner-stone of our business model

• Enable high-value & repeatable solutions • Leverage multi-party relationships to accelerate business

Systems Integrator

Customer


Why Hortonworks?

•  The most Apache Hadoop experience and expertise –  Reliable Hadoop from the experts, project leaders, architects and

builders –  Collectively over 90 years operational Hadoop experience

(at least double that of the closest competitor)

•  Influence community direction –  Provides a direct connection to drive innovation in the community

•  Focus on the ecosystem –  Roadmap and vision to provide access to the wide ecosystem of

enterprise application, such as Teradata

•  Industry momentum –  Collaborate across partners (ISVs/IHVs/SIs) to enable high-value

solutions


Hortonworks Apache Hadoop Leadership

Hortonworkers… the builders, operators and core architects of Apache Hadoop

•  Most experienced team running Hadoop in production at scale (> 5 years, 42000 nodes)

•  All “stable” releases of Apache Hadoop have been shipped by Hortonworkers

Leadership

• VP and PMC of Hadoop Arun Murthy

• Core Architect of YARN Arun Murthy

• Core Architect MapReduce2 Arun Murthy

• VP & PMC of Pig Daniel Dai

• VP of Zookeeper Mahdev Konar

•  Inventor of HCatalog Alan Gates

• Project Lead for Ambari Mahedev Konar

• Original Project Lead Eric Baldschweiler

“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”

- Jeff Kelly: Wkibon





Operate Integrate

Develop Interact

Distributed Storage (HDFS)

Distributed Processing (MapReduce)

Query (Hive)

Scripting (Pig)

Metadata Services (HCatalog)

Non

-Rel

atio

nal D

atab

ase

(HB

ase)


Dat

a In

tegr

atio

n S

ervi

ces

(HC

atal

og A

PIs

, Web

HD

FS,

Tale

nd O

pen

Stu

dio

for B

ig D

ata,

Sqo

op, F

lum

e)

Man

agem

ent &

Mon

itorin

g S

ervi

ces

(Am

bari,

Zoo

keep

er)

Wor

kflo

w &

Sch

edul

ing

(Ooz

ie)


Apache Hadoop Release Management

Hadoop 1

1.0.1 1.0.2 1.0.3

1.1.1 1.1.2

HDP 1.0

•  Apache Hadoop Release management is run by Hortonworks •  Matt Foley, Release Manager for Hadoop 1 •  Arun Murthy, Release Manager for Hadoop 2 •  Ashutosh Chauhan, Release Manager for Hive •  Daniel Dai, Release Manager for Pig •  Alan Gates, Release Manager for Hcat

•  Hadoop Core releases validated (and fixed) by Hortonworks •  ~1300 end to end system tests run in house using our IP before any release can be made

•  Hortonworks Data Platform is released directly from Apache Hadoop branches


Full Stack High Availability


HA Pairs

Core Switch

Rack Switch Rack Switch

Namenode HA Manager

Job Tracker HA Manager

Etc. daemon HA Manager

Namenode HA Manager

Job Tracker HA Manager

Etc. daemon HA Manager

HA Cluster

Full Stack High Availability

•  Failover and restart for •  NameNode •  JobTracker •  HBase and other services to come…

•  Open API allows use of Proven HA from multiple vendors (Red Hat & VMWare)

•  Minimized changes to clients and configuration

•  Complementary to 2.0 HA efforts •  Server & Operating System failure

detection and VM restart •  Smart resource management

ensures sufficient resources are available to restart VMs

HA

HA

Addresses HA needs on stable Apache Hadoop 1.0


Capacity Scheduler Delivers Multi-tenancy

• Queue definition – % of total system memory – % CPU utilization (not slot count)

• Queues per team – Soft limits and hard so you can use entire cluster if available – Ownership and security built in

• Proactive resource management – Lots of rules and observation points – Don’t start another task if it will blow up the node – Don’t start another task if other workloads are spinning up

• Better than Fair + Preemption (HDP Supports All) – Utilization not measured by slot count (can blow up a node /

cluster) – Doesn’t start all tasks automatically (proactive vs. reactive)


HCatalog METADATA


HCatalog

Table access Aligned metadata REST API

•  Raw Hadoop data •  Inconsistent, unknown •  Tool specific access

Apache HCatalog provides flexible metadata services across tools and external access

Metadata Services

•  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive)

•  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API

Shared table and schema management opens the platform

Options Lead to Complexity

© Hortonworks 2012

Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string,

bytes, maps, tuples, bags

int, float, string, maps, structs, lists

Schema Encoded in app Declared in script or read by loader

Read from metadata

Data location Encoded in app Declared in script Read from metadata

Data format Encoded in app Declared in script Read from metadata

•  Pig and MR users need to know a lot to write their apps •  When data schema, location, or format change Pig and MR apps must be

rewritten, retested, and redeployed •  Hive users have to load data from Pig/MR users to have access to it


Hadoop Ecosystem

metastore- tables- partitions- files- types

HDFS

dn1

.

.

dn2

.

dn3

.

.

.

.

.

.

.

.

.

.

dnN

Pig(scripting)

Hive(SQL)

MapReduce(Java)

Input/OutputFormat

Interface:Load/Store

Interface:SerDe

Input/OutputFormat

Input/OutputFormat

Interface:SQL

DDL

DML


Opening up Metadata to MR & Pig

HCat Metadata layer

metastore- tables- partitions- files- types

HDFS

dn1

.

.

dn2

.

dn3

.

.

.

.

.

.

.

.

.

.

dnN

Pig(scripting)

Hive(SQL)

MapReduce(Java)

HCatInput/OutputFormat

Interface:HCatLoad/Store

Interface:SerDe

Interface:SQL

Tools With HCatalog

© Hortonworks 2012

Feature MapReduce + HCatalog

Pig + HCatalog Hive

Record format Record Tuple Record Data model int, float, string,

maps, structs, lists int, float, string, bytes, maps, tuples, bags

int, float, string, maps, structs, lists

Schema Read from metadata

Read from metadata

Read from metadata

Data location Read from metadata

Read from metadata

Read from metadata

Data format Read from metadata

Read from metadata

Read from metadata

•  Pig/MR users can read schema from metadata •  Pig/MR users are insulated from schema, location, and format changes •  All users have access to other users’ data as soon as it is committed


Hadoop Cluster

Existing Infrastructure

Metadata Services

metastore

Pig

HBase

Hive applications

data stores

visualization

REST •  ddl •  dml

DML

DML

DML

HCatalog

create describe


HDFS HBase External Store

Existing & New Applications

MapReduce Pig Hive

HCatalog

HCatalog RESTful Web Services

Services Integration

Provides RESTful API as “front door” for Hadoop

•  Opens the door to languages other than Java

•  Thin clients via web services vs. fat-clients in gateway

•  Insulation from interface changes release to release

Opens Hadoop to integration with existing and new applications

WebHDFS


Data Integration Services

•  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig

•  Oozie scheduling allows you to manage and stage jobs

•  Connectors for any database, business application or system

•  Integrated HCatalog storage

Bridge the gap between legacy data & Hadoop

Simplify and speed development


Teradata and Hortonworks Partner to Provide the First Enterprise Reference Architecture

for Hadoop and Big Data

Partnership provides clear path to enterprise for Hadoop

•  Reference architecture that provides guidance on best applications

for Teradata, Teradata Aster, and Hadoop

•  Clear partnership between industry and community leaders

•  Deeper integration to ease data movement in/out of Hadoop

•  Joint R&D and go-to-market


Ambari Cluster Provisioning Configuration Management Monitoring


Ambari Architecture

•  Installs your cluster onto target HW for you

•  Manage, reconfigure from one place

•  Monitor key and meaningful Hadoop metrics, not just OS / HW

•  Scalable in line w/ Hadoop itself

Hadoop

n1

.

.

n2

.

n3

.

.

.

.

.

.

.

.

.

.

nNworker node

Ganglia

Puppet

Nagios

data and task sink

Ambari

Ganglia PuppetNagios

php portalcontroller

view

operator


Ambari Live Demonstration


Why HDP?

ONLY Hortonworks Data Platform provides…

•  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects

•  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data

•  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees

•  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources

•  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters


Hortonworks Support Subscriptions

Objective: help organizations to successfully develop and deploy solutions based upon Apache Hadoop

• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments

– Up to 24x7 with 1-hour response times

• Delivered by the Apache Hadoop experts – Backed by development team that has released every major

version of Apache Hadoop since 0.1

• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches

can be included in future versions of Hadoop projects

Page 31


Hortonworks Training

Objective: help organizations overcome Hadoop knowledge gaps

• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available

(hortonworks.com/training)

• Comprehensive certification programs

• Customized, on-site courses available

Page 32


Thank You! Questions & Answers

Page 33

Date post:	10-May-2015
Category:	Technology
Upload:	hortonworks
View:	2,647 times
Download:	2 times

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx

Technology