Date post: | 27-Aug-2014 |
Category: |
Software |
Upload: | hortonworks |
View: | 1,438 times |
Download: | 3 times |
Page 1 © Hortonworks Inc. 2014
Discover HDP 2.1 Interactive SQL Query in Hadoop with Apache Hive
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Carter Shanklin
Hortonworks Director of Product Management & PM for Apache Hive in Hortonworks Data Platform
Owen O’Malley
Hortonworks Co-Founder, Engineer & Committer for Apache Hive project
Page 3 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
A Modern Data Architecture AP
PLICAT
IONS
DATA
SYSTEM
REPOSITORIES
RDBMS EDW MPP
Business Analy<cs
Custom Applica<ons
Packaged Applica<ons
Gov
erna
nce
&
Inte
grat
ion
ENTERPRISE HADOOP
Secu
rity
Ope
ratio
ns
Data Access
Data Management
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
GeolocaCon Data
Page 4 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS YARN : Data Opera<ng System
DATA MANAGEMENT
DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
Page 5 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
DATA MANAGEMENT
GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
Search
Solr
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
YARN : Data Opera<ng System
DATA ACCESS
SQL
Hive/Tez, HCatalog
Page 6 © Hortonworks Inc. 2014
Apache Hive After the Stinger Initiative: Speed, Scale & SQL Compliance
Page 7 © Hortonworks Inc. 2014
Hive: SQL Analytics For Any Data Size
Sensor Mobile
Weblog OperaConal
/ MPP
Store and Query all Data in Hive
Use Exis<ng SQL Tools and Exis<ng SQL Processes
SQL Queries
Page 8 © Hortonworks Inc. 2014
The Stinger Initiative: Complete
• Community initiative around Hive • Enables Hive to support interactive workloads • Enhances Hive’s standard SQL interface for Hadoop • Improves existing tools & preserves investments
Query Processing
Vectorized Query
Execution Engine
Tez
= 100X + + File
Format
ORCFile
Page 9 © Hortonworks Inc. 2014
New in Hive HDP 2.1: Speed
New Features for Speed
Interactive query using Hive on Tez Vectorized query execution Cost-based optimizer
Page 10 © Hortonworks Inc. 2014
New in HDP 2.1: More Than 10 New SQL Features
New SQL Features
Subquery for IN / NOT IN Support for EXISTS and NOT EXISTS Common table expressions (CTEs) Support for CHAR datatype Scale and precision support for DECIMAL datatype JOIN conditions in the WHERE clause Cancel jobs via ODBC / JDBC Support for Unicode column names Permanent functions Stream data into Hive from Flume (Experimental feature)
Page 11 © Hortonworks Inc. 2014
Hive’s Journey to SQL Compliance Evolu<on of SQL Compliance in Hive
SQL Datatypes SQL SemanCcs INT/TINYINT/SMALLINT/BIGINT SELECT, INSERT
FLOAT/DOUBLE GROUP BY, ORDER BY, HAVING
BOOLEAN JOIN on explicit join key
ARRAY, MAP, STRUCT, UNION Inner, outer, cross and semi joins
STRING Sub-‐queries in the FROM clause
BINARY ROLLUP and CUBE
TIMESTAMP UNION
DECIMAL Standard aggregaCons (sum, avg, etc.)
DATE Custom Java UDFs
VARCHAR Windowing funcCons (OVER, RANK, etc.)
CHAR Advanced UDFs (ngram, XPath, URL)
Interval Types Sub-‐queries for IN/NOT IN, HAVING
JOINs in WHERE Clause
Common Table Expressions (WITH Clause)
INSERT / UPDATE / DELETE
Legend Available
Roadmap
Hive 11
Hive 12
Hive 13
Page 12 © Hortonworks Inc. 2014
New in HDP 2.1: Other Improvements
Other New Hive Features
SQL standard authorization
Hive job visualizer in Ambari
PAM authentication support
SSL encryption support in HiveServer2
Dynamic partition scalability
Page 13 © Hortonworks Inc. 2014
Demo
Page 14 © Hortonworks Inc. 2014
FoodMart Dataset
• FoodMart Dataset, replicated 275 times (~ 10GB data) • Queries run locally on an HDP 2.1 Sandbox. • Queries to do some customer analytics.
sales_fact_1997 customer
Other Dimension
Tables
time_by_day
Page 15 © Hortonworks Inc. 2014
Learn More About Hive & The Stinger Initiative
Hortonworks.com/labs/stinger/
Register for the remaining 5 Discover HDP 2.1 Webinars
Hortonworks.com/
webinars
Next Webinar:
Apache Falcon for Data Governance in Hadoop Wednesday, May 21, 10am
Pacific