Post on 26-Jan-2015
description
transcript
1©MapR Technologies - Confidential
Using Standard File-Based Applications and SQL-Based
Tools with Hadoop
2©MapR Technologies - Confidential
Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies
http://info.mapr.com/HUG-7-2012
3©MapR Technologies - Confidential
The MapR Distribution for Apache Hadoop
The open, enterprise-grade distribution for Apache Hadoop– Open source components• Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …
– Enhancements to make Hadoop more open and enterprise-grade
Fastest growing distribution– Thousands of clusters deployed
Now available as a service with Amazon Elastic MapReduce (EMR)– http://aws.amazon.com/elasticmapreduce/mapr
4©MapR Technologies - Confidential
MapR
Make Hadoop more open
Make Hadoop enterprise-grade
This presentation
5©MapR Technologies - Confidential
Not All Applications Use the Hadoop APIs
Applications and libraries that use files and/or SQL
Applications and libraries that use the Hadoop APIs
30 years100,000s applications
10,000s libraries10s programming languages
6©MapR Technologies - Confidential
Hadoop Needs Industry-Standard Interfaces
• MapReduce and HBase applications• Mostly custom-built
Hadoop API
• File-based applications• Supported by most operating systemsNFS
• SQL-based tools• Supported by most BI applications and
query buildersODBC
7©MapR Technologies - Confidential
NFS
8©MapR Technologies - Confidential
Your Data is Your Data
HDFS-based Hadoop distributions do not (cannot) support NFS
Your data is your data – make sure you can access it–Why store your data in a system which cannot be accessed
by 95% of the world’s applications and libraries?
Access to HDFS source code != access to your data
9©MapR Technologies - Confidential
The NFS Protocol
RFC 1813
Very simple protocol
Random reads/writes– Read count bytes from
offset offset of file file– Write buffer data to
offset offset of a file file
HDFS does not support random writes so it cannot support NFS
WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;
struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>;};
READ3res NFSPROC3_READ(READ3args) = 6;
struct READ3args { nfs_fh3 file; offset3 offset; count3 count;};
10©MapR Technologies - Confidential
Hadoop Was Designed to Support Multiple Storage Layers
HD
FSo.
a.h.
hdfs
.Dist
ribut
edFi
leSy
stem
NFS interface
Hadoop FileSystem API
S3o.
a.h.
fs.s
3nati
ve.N
ative
S3Fi
leSy
stem
Loca
l File
Sys
tem
o.a.
h.fs
.Loc
alFi
leSy
stem
FTP
o.a.
h.fs
.ftp.
FTPF
ileSy
stem
Map
R st
orag
e la
yer
com
.map
r.fs.
Map
RFile
Syst
em
o.a.h.fs.FileSystem InterfaceMapReduce
11©MapR Technologies - Confidential
One NFS Gateway
12©MapR Technologies - Confidential
Multiple NFS Gateways
13©MapR Technologies - Confidential
Multiple NFS Gateways with Load Balancing
14©MapR Technologies - Confidential
Multiple NFS Gateways with NFS HA (VIPs)
15©MapR Technologies - Confidential
Customer Examples: Import/Export Data
Network security vendor– Network packet captures from switches are streamed into the cluster– New pattern definitions are loaded into online IPS via NFS
Online measurement company– Clickstreams from application servers are streamed into the cluster
SaaS company– Exporting a database to Hadoop over NFS
Ad exchange– Bids and transactions are streamed into the cluster
16©MapR Technologies - Confidential
Customer Examples: Productivity and Operations
Retailer– Operational scripts are easier with NFS than DFS + MapReduce• chmod/chown, file system searches/greps, make, tab-complete
– Consolidate object store with analytics
Credit card company– User and project home directories on Linux gateways• Local files, scripts, source code, …• Administrators manage quotas, snapshots/backups, …
Large Internet company– Web server serve MapReduce results (item relationships) directly from cluster
Email marketing company– Object store with HBase and NFS
17©MapR Technologies - Confidential
ODBC
18©MapR Technologies - Confidential
ODBC
ODBC – Open DataBase Connectivity– Open standard API for accessing a SQL-based backend– Developed by Microsoft and Simba Technologies in 1992
Flagship API for SQL-based BI and reporting– Excel, Tableau, MicroStrategy, Crystal Reports, …
Advanced ODBC drivers use the latest 3.52 specification
19©MapR Technologies - Confidential
MapR ODBC Driver
MapR provides a Hive ODBC 3.52 driver– Developed in partnership with ODBC inventor Simba Technologies– Compliant with latest ODBC 3.52 specification• 32- and 64-bit platform support• Windows and Linux
Enables direct SQL access to MapR-stored data by translating SQL to HiveQL
SQLizer enables seamless connectivity– Provides ANSI SQL-92 front-end– Targeted for existing apps that generate standard SQL queries– Transforms SQL query into HiveQL query
20©MapR Technologies - Confidential
Example: Tableau
21©MapR Technologies - Confidential
Example: Tableau
22©MapR Technologies - Confidential
Example: Open source query builder (Kaimon)
23©MapR Technologies - Confidential
Example: Microsoft Excel
24©MapR Technologies - Confidential
Join MapR
Join the fastest growing Hadoop company
Open positions in every discipline– Engineers– Solution Architects– Product Management
Email jobs@mapr.com
25©MapR Technologies - Confidential
Time for Questions
Download slides or send me an email– http://info.mapr.com/HUG-7-2012
Download MapR to learn more– www.mapr.com/download