MongoDB & Hadoop, Sittin' in a Tree

Post on 12-May-2015

3,287 views 0 download

Tags:

transcript

K Young - CEO, Mortar

MongoDB + Hadoopsittin’ in a tree

OF THIS SESSION

Overview

Super-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig

SUPER-FAST INTRO

Hadoop

From Google researchBuilt for massive parallelizationBatch (for now)Widely applicable

SUPER-FAST INTRO

Hadoop

Social Graph

Predict

Detect

Genetics

SUPER-FAST INTRO

Hadoop

ON HADOOP

Pig

Less code Expressive codeCompiles to MRInsulates from APIPopular (LinkedIn, Twitter, Salesforce, Yahoo, Stanford University...)

BRIEF, EXPRESSIVE

LIKE PROCEDURAL SQL

Pig

(thanks: twitter hadoop world presentation)

FOR SERIOUS

The Same Script, In MapReduce

Alternatives to Hadoop

Write MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data store

MONGODB NATIVE MAPREDUCE

Alternatives to HadoopMONGODB AGGREGATION FRAMEWORK

Great when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok

MOTIVATIONS

MongoDB + Pig

Data storage and data processing are often separate concerns

Hadoop is built for scalable processing of large datasets

SIMILAR PHILOSOPHY

MongoDB, Pig

Poly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got its

name because Pigs are omnivorous)

MortarFAST INTRO

Open-source code-based dev framework for data, built on Hadoop and Pig

Inspired by Rails

Self-contained, organized, executable projects

LOADMONGO => PIG

Mongo-Hadoop connector

LOAD 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>' USING com.mongodb.hadoop.pig.MongoLoader();

STOREPIG => MONGO

STORE result INTO 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>'USING com.mongodb.hadoop.pig.MongoStorage( 'update [key1, key2, key3]', '{key1: 1, key2: 1, key3: 1}, {unique:false, dropDups: false}');

What’s my schema?GENERATE IT

Pig is schema-optional.No schema: document#'user'#'name'With schema: user.name

What’s in the collection?CHARACTERIZE IT

Hadoop-based utility describes your collection

• Field name

• Unique value count

• Example value

• Data type

• Example value count

AppendixLINKS

Reference:

http://help.mortardata.com/reference/loading_and_storing_data/MongoDB

Mongo-Hadoop connector

https://github.com/mortardata/mongo-hadoop

@kky@mortardata

help.mortardata.com

Lunch 1:20 – 2:05 Next Sessions at 2:05 5th Floor:

West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDB

West Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4

Juilliard Complex: Business Track: Business Track: How MongoDB Helps Telefonica Digital Accelerate Time to Market

Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session

7th Floor:

Empire Complex: Real-Time Integration Between MongoDB and SQL Databases

SoHo Complex: High Performance, Scalable MongoDB in a Bare Metal Cloud