July 2012 HUG: Using Standard File-Based Applications and SQL-Based Tools with Hadoop

Post on 26-Jan-2015

104 views 0 download

description

MapR makes Hadoop a more open platform by supporting industry-standard interfaces, including NFS and ODBC. The NFS interface enables users to leverage standard file-based applications, and makes it easier to get data into and out of the cluster, while the ODBC interface enables users to leverage standard BI tools and query builders. This talk covers the motivation for supporting industry-standard interfaces as well as several real-world use cases. In addition, this talk explains the technical details behind these capabilities and how they actually work.

transcript

1©MapR Technologies - Confidential

Using Standard File-Based Applications and SQL-Based

Tools with Hadoop

2©MapR Technologies - Confidential

Tomer Shiran tshiran@maprtech.com Director of Product Management, MapR Technologies

http://info.mapr.com/HUG-7-2012

3©MapR Technologies - Confidential

The MapR Distribution for Apache Hadoop

The open, enterprise-grade distribution for Apache Hadoop– Open source components• Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …

– Enhancements to make Hadoop more open and enterprise-grade

Fastest growing distribution– Thousands of clusters deployed

Now available as a service with Amazon Elastic MapReduce (EMR)– http://aws.amazon.com/elasticmapreduce/mapr

4©MapR Technologies - Confidential

MapR

Make Hadoop more open

Make Hadoop enterprise-grade

This presentation

5©MapR Technologies - Confidential

Not All Applications Use the Hadoop APIs

Applications and libraries that use files and/or SQL

Applications and libraries that use the Hadoop APIs

30 years100,000s applications

10,000s libraries10s programming languages

6©MapR Technologies - Confidential

Hadoop Needs Industry-Standard Interfaces

• MapReduce and HBase applications• Mostly custom-built

Hadoop API

• File-based applications• Supported by most operating systemsNFS

• SQL-based tools• Supported by most BI applications and

query buildersODBC

7©MapR Technologies - Confidential

NFS

8©MapR Technologies - Confidential

Your Data is Your Data

HDFS-based Hadoop distributions do not (cannot) support NFS

Your data is your data – make sure you can access it–Why store your data in a system which cannot be accessed

by 95% of the world’s applications and libraries?

Access to HDFS source code != access to your data

9©MapR Technologies - Confidential

The NFS Protocol

RFC 1813

Very simple protocol

Random reads/writes– Read count bytes from

offset offset of file file– Write buffer data to

offset offset of a file file

HDFS does not support random writes so it cannot support NFS

WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;

struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>;};

READ3res NFSPROC3_READ(READ3args) = 6;

struct READ3args { nfs_fh3 file; offset3 offset; count3 count;};

10©MapR Technologies - Confidential

Hadoop Was Designed to Support Multiple Storage Layers

HD

FSo.

a.h.

hdfs

.Dist

ribut

edFi

leSy

stem

NFS interface

Hadoop FileSystem API

S3o.

a.h.

fs.s

3nati

ve.N

ative

S3Fi

leSy

stem

Loca

l File

Sys

tem

o.a.

h.fs

.Loc

alFi

leSy

stem

FTP

o.a.

h.fs

.ftp.

FTPF

ileSy

stem

Map

R st

orag

e la

yer

com

.map

r.fs.

Map

RFile

Syst

em

o.a.h.fs.FileSystem InterfaceMapReduce

11©MapR Technologies - Confidential

One NFS Gateway

12©MapR Technologies - Confidential

Multiple NFS Gateways

13©MapR Technologies - Confidential

Multiple NFS Gateways with Load Balancing

14©MapR Technologies - Confidential

Multiple NFS Gateways with NFS HA (VIPs)

15©MapR Technologies - Confidential

Customer Examples: Import/Export Data

Network security vendor– Network packet captures from switches are streamed into the cluster– New pattern definitions are loaded into online IPS via NFS

Online measurement company– Clickstreams from application servers are streamed into the cluster

SaaS company– Exporting a database to Hadoop over NFS

Ad exchange– Bids and transactions are streamed into the cluster

16©MapR Technologies - Confidential

Customer Examples: Productivity and Operations

Retailer– Operational scripts are easier with NFS than DFS + MapReduce• chmod/chown, file system searches/greps, make, tab-complete

– Consolidate object store with analytics

Credit card company– User and project home directories on Linux gateways• Local files, scripts, source code, …• Administrators manage quotas, snapshots/backups, …

Large Internet company– Web server serve MapReduce results (item relationships) directly from cluster

Email marketing company– Object store with HBase and NFS

17©MapR Technologies - Confidential

ODBC

18©MapR Technologies - Confidential

ODBC

ODBC – Open DataBase Connectivity– Open standard API for accessing a SQL-based backend– Developed by Microsoft and Simba Technologies in 1992

Flagship API for SQL-based BI and reporting– Excel, Tableau, MicroStrategy, Crystal Reports, …

Advanced ODBC drivers use the latest 3.52 specification

19©MapR Technologies - Confidential

MapR ODBC Driver

MapR provides a Hive ODBC 3.52 driver– Developed in partnership with ODBC inventor Simba Technologies– Compliant with latest ODBC 3.52 specification• 32- and 64-bit platform support• Windows and Linux

Enables direct SQL access to MapR-stored data by translating SQL to HiveQL

SQLizer enables seamless connectivity– Provides ANSI SQL-92 front-end– Targeted for existing apps that generate standard SQL queries– Transforms SQL query into HiveQL query

20©MapR Technologies - Confidential

Example: Tableau

21©MapR Technologies - Confidential

Example: Tableau

22©MapR Technologies - Confidential

Example: Open source query builder (Kaimon)

23©MapR Technologies - Confidential

Example: Microsoft Excel

24©MapR Technologies - Confidential

Join MapR

Join the fastest growing Hadoop company

Open positions in every discipline– Engineers– Solution Architects– Product Management

Email jobs@mapr.com

25©MapR Technologies - Confidential

Time for Questions

Download slides or send me an email– http://info.mapr.com/HUG-7-2012

Download MapR to learn more– www.mapr.com/download