Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | hortonworks |
View: | 2,647 times |
Download: | 2 times |
© Hortonworks Inc. 2012
Hortonworks & Systems Integrators Mitch Ferguson VP, Business Development Rikin Shah Dir, Field Engineering
September 5, 2012
Page 1
© Hortonworks Inc. 2012
Big data changes the game
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail Purchase record Payment record
ERP
CRM
WEB
BIG DATA
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMS Sentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
Transactions + Interactions + Observations = BIG DATA
© Hortonworks Inc. 2012
Hortonworks Snapshot
The industry leading and only 100% open source Apache Hadoop distribution
Most experienced open source leadership team – Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) – Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss) – Mitch Ferguson: VP BD (SringSource, VMWare) – John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) – Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) – Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)
Business model focused on customer success: Hadoop support, services & training – Subscription support for Hortonworks Data Platform – Training business: Private and public classes
available for developers & administrators
• Headquarters Sunnyvale, CA
• 100+ Employees
• Formed with core Apache Hadoop engineering team from Yahoo!
• 40+ engineers and architects including 25+ Hadoop committers
© Hortonworks Inc. 2012
Hortonworks Business Strategy
Enable the next gen data management platform
• Accelerate the adoption of Apache Hadoop
• Create a vibrant eco-systems – ISVs, IHV, Systems Integrators
• Provide world-class enterprise Support & Training
© Hortonworks Inc. 2012
We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop.
Hortonworks Vision & Role
Be diligent stewards of the open source core 1
Be tireless innovators beyond the core 2
Provide robust data platform services & open APIs 3
Enable the ecosystem at each layer of the stack 4
Make the platform enterprise-ready & easy to use 5
© Hortonworks Inc. 2012
Enabling Hadoop as Enterprise Big Data Platform
DEVELOPER Data Platform Services & Open APIs
Hortonworks Data Platform
Applications, Business Tools, Development Tools, Open APIs and access Data Movement & Integration, Data Management Systems, Systems Management
Installation & Configuration, Administration, Monitoring, High Availability, Replication, Multi-tenancy, ..
Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs
© Hortonworks Inc. 2012
Hortonworks Partner Eco-System
© Hortonworks Inc. 2012
Hortonworks & SIs Our business models are 100% Complementary • Systems Integrators are a corner-stone of our business model
• Enable high-value & repeatable solutions • Leverage multi-party relationships to accelerate business
Systems Integrator
Customer
© Hortonworks Inc. 2012
Why Hortonworks?
• The most Apache Hadoop experience and expertise – Reliable Hadoop from the experts, project leaders, architects and
builders – Collectively over 90 years operational Hadoop experience
(at least double that of the closest competitor)
• Influence community direction – Provides a direct connection to drive innovation in the community
• Focus on the ecosystem – Roadmap and vision to provide access to the wide ecosystem of
enterprise application, such as Teradata
• Industry momentum – Collaborate across partners (ISVs/IHVs/SIs) to enable high-value
solutions
© Hortonworks Inc. 2012
Hortonworks Apache Hadoop Leadership
Hortonworkers… the builders, operators and core architects of Apache Hadoop
• Most experienced team running Hadoop in production at scale (> 5 years, 42000 nodes)
• All “stable” releases of Apache Hadoop have been shipped by Hortonworkers
Leadership
• VP and PMC of Hadoop Arun Murthy
• Core Architect of YARN Arun Murthy
• Core Architect MapReduce2 Arun Murthy
• VP & PMC of Pig Daniel Dai
• VP of Zookeeper Mahdev Konar
• Inventor of HCatalog Alan Gates
• Project Lead for Ambari Mahedev Konar
• Original Project Lead Eric Baldschweiler
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wkibon
© Hortonworks Inc. 2012
Hortonworks Data Platform
© Hortonworks Inc. 2012
Hortonworks Data Platform
Operate Integrate
Develop Interact
Distributed Storage (HDFS)
Distributed Processing (MapReduce)
Query (Hive)
Scripting (Pig)
Metadata Services (HCatalog)
Non
-Rel
atio
nal D
atab
ase
(HB
ase)
Hortonworks Data Platform
Dat
a In
tegr
atio
n S
ervi
ces
(HC
atal
og A
PIs
, Web
HD
FS,
Tale
nd O
pen
Stu
dio
for B
ig D
ata,
Sqo
op, F
lum
e)
Man
agem
ent &
Mon
itorin
g S
ervi
ces
(Am
bari,
Zoo
keep
er)
Wor
kflo
w &
Sch
edul
ing
(Ooz
ie)
© Hortonworks Inc. 2012
Apache Hadoop Release Management
Hadoop 1
1.0.1 1.0.2 1.0.3
1.1.1 1.1.2
HDP 1.0
• Apache Hadoop Release management is run by Hortonworks • Matt Foley, Release Manager for Hadoop 1 • Arun Murthy, Release Manager for Hadoop 2 • Ashutosh Chauhan, Release Manager for Hive • Daniel Dai, Release Manager for Pig • Alan Gates, Release Manager for Hcat
• Hadoop Core releases validated (and fixed) by Hortonworks • ~1300 end to end system tests run in house using our IP before any release can be made
• Hortonworks Data Platform is released directly from Apache Hadoop branches
© Hortonworks Inc. 2012
Full Stack High Availability
© Hortonworks Inc. 2012
HA Pairs
Core Switch
Rack Switch Rack Switch
Namenode HA Manager
Job Tracker HA Manager
Etc. daemon HA Manager
Namenode HA Manager
Job Tracker HA Manager
Etc. daemon HA Manager
HA Cluster
Full Stack High Availability
• Failover and restart for • NameNode • JobTracker • HBase and other services to come…
• Open API allows use of Proven HA from multiple vendors (Red Hat & VMWare)
• Minimized changes to clients and configuration
• Complementary to 2.0 HA efforts • Server & Operating System failure
detection and VM restart • Smart resource management
ensures sufficient resources are available to restart VMs
HA
HA
Addresses HA needs on stable Apache Hadoop 1.0
© Hortonworks Inc. 2012
Capacity Scheduler Delivers Multi-tenancy
• Queue definition – % of total system memory – % CPU utilization (not slot count)
• Queues per team – Soft limits and hard so you can use entire cluster if available – Ownership and security built in
• Proactive resource management – Lots of rules and observation points – Don’t start another task if it will blow up the node – Don’t start another task if other workloads are spinning up
• Better than Fair + Preemption (HDP Supports All) – Utilization not measured by slot count (can blow up a node /
cluster) – Doesn’t start all tasks automatically (proactive vs. reactive)
© Hortonworks Inc. 2012
HCatalog METADATA
© Hortonworks Inc. 2012
HCatalog
Table access Aligned metadata REST API
• Raw Hadoop data • Inconsistent, unknown • Tool specific access
Apache HCatalog provides flexible metadata services across tools and external access
Metadata Services
• Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive)
• Accessibility: share data as tables in and out of HDFS • Availability: enables flexible, thin-client access via REST API
Shared table and schema management opens the platform
Options Lead to Complexity
© Hortonworks 2012
Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string,
bytes, maps, tuples, bags
int, float, string, maps, structs, lists
Schema Encoded in app Declared in script or read by loader
Read from metadata
Data location Encoded in app Declared in script Read from metadata
Data format Encoded in app Declared in script Read from metadata
• Pig and MR users need to know a lot to write their apps • When data schema, location, or format change Pig and MR apps must be
rewritten, retested, and redeployed • Hive users have to load data from Pig/MR users to have access to it
© Hortonworks Inc. 2012
Hadoop Ecosystem
metastore- tables- partitions- files- types
HDFS
dn1
.
.
dn2
.
dn3
.
.
.
.
.
.
.
.
.
.
dnN
Pig(scripting)
Hive(SQL)
MapReduce(Java)
Input/OutputFormat
Interface:Load/Store
Interface:SerDe
Input/OutputFormat
Input/OutputFormat
Interface:SQL
DDL
DML
© Hortonworks Inc. 2012
Opening up Metadata to MR & Pig
HCat Metadata layer
metastore- tables- partitions- files- types
HDFS
dn1
.
.
dn2
.
dn3
.
.
.
.
.
.
.
.
.
.
dnN
Pig(scripting)
Hive(SQL)
MapReduce(Java)
HCatInput/OutputFormat
Interface:HCatLoad/Store
Interface:SerDe
Interface:SQL
Tools With HCatalog
© Hortonworks 2012
Feature MapReduce + HCatalog
Pig + HCatalog Hive
Record format Record Tuple Record Data model int, float, string,
maps, structs, lists int, float, string, bytes, maps, tuples, bags
int, float, string, maps, structs, lists
Schema Read from metadata
Read from metadata
Read from metadata
Data location Read from metadata
Read from metadata
Read from metadata
Data format Read from metadata
Read from metadata
Read from metadata
• Pig/MR users can read schema from metadata • Pig/MR users are insulated from schema, location, and format changes • All users have access to other users’ data as soon as it is committed
© Hortonworks Inc. 2012
Hadoop Cluster
Existing Infrastructure
Metadata Services
metastore
Pig
HBase
Hive applications
data stores
visualization
REST • ddl • dml
DML
DML
DML
HCatalog
create describe
© Hortonworks Inc. 2012
HDFS HBase External Store
Existing & New Applications
MapReduce Pig Hive
HCatalog
HCatalog RESTful Web Services
Services Integration
Provides RESTful API as “front door” for Hadoop
• Opens the door to languages other than Java
• Thin clients via web services vs. fat-clients in gateway
• Insulation from interface changes release to release
Opens Hadoop to integration with existing and new applications
WebHDFS
© Hortonworks Inc. 2012
Data Integration Services
• Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig
• Oozie scheduling allows you to manage and stage jobs
• Connectors for any database, business application or system
• Integrated HCatalog storage
Bridge the gap between legacy data & Hadoop
Simplify and speed development
© Hortonworks Inc. 2012
Teradata and Hortonworks Partner to Provide the First Enterprise Reference Architecture
for Hadoop and Big Data
Partnership provides clear path to enterprise for Hadoop
• Reference architecture that provides guidance on best applications
for Teradata, Teradata Aster, and Hadoop
• Clear partnership between industry and community leaders
• Deeper integration to ease data movement in/out of Hadoop
• Joint R&D and go-to-market
© Hortonworks Inc. 2012
Ambari Cluster Provisioning Configuration Management Monitoring
© Hortonworks Inc. 2012
Ambari Architecture
• Installs your cluster onto target HW for you
• Manage, reconfigure from one place
• Monitor key and meaningful Hadoop metrics, not just OS / HW
• Scalable in line w/ Hadoop itself
Hadoop
n1
.
.
n2
.
n3
.
.
.
.
.
.
.
.
.
.
nNworker node
Ganglia
Puppet
Nagios
data and task sink
Ambari
Ganglia PuppetNagios
php portalcontroller
view
operator
© Hortonworks Inc. 2012
Ambari Live Demonstration
© Hortonworks Inc. 2012
Why HDP?
ONLY Hortonworks Data Platform provides…
• Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects
• Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data
• Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees
• Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources
• Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters
© Hortonworks Inc. 2012
Hortonworks Support Subscriptions
Objective: help organizations to successfully develop and deploy solutions based upon Apache Hadoop
• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments
– Up to 24x7 with 1-hour response times
• Delivered by the Apache Hadoop experts – Backed by development team that has released every major
version of Apache Hadoop since 0.1
• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches
can be included in future versions of Hadoop projects
Page 31
© Hortonworks Inc. 2012
Hortonworks Training
Objective: help organizations overcome Hadoop knowledge gaps
• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available
(hortonworks.com/training)
• Comprehensive certification programs
• Customized, on-site courses available
Page 32
© Hortonworks Inc. 2012
Thank You! Questions & Answers
Page 33