+ All Categories
Home > Technology > MongoDB & Hadoop, Sittin' in a Tree

MongoDB & Hadoop, Sittin' in a Tree

Date post: 12-May-2015
Category:
Upload: mongodb
View: 3,287 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
K Young - CEO, Mortar MongoDB + Hadoop sittin’ in a tree
Transcript
Page 1: MongoDB & Hadoop, Sittin' in a Tree

K Young - CEO, Mortar

MongoDB + Hadoopsittin’ in a tree

Page 2: MongoDB & Hadoop, Sittin' in a Tree

OF THIS SESSION

Overview

Super-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig

Page 3: MongoDB & Hadoop, Sittin' in a Tree

SUPER-FAST INTRO

Hadoop

From Google researchBuilt for massive parallelizationBatch (for now)Widely applicable

Page 4: MongoDB & Hadoop, Sittin' in a Tree

SUPER-FAST INTRO

Hadoop

Page 5: MongoDB & Hadoop, Sittin' in a Tree

Social Graph

Page 6: MongoDB & Hadoop, Sittin' in a Tree

Predict

Page 7: MongoDB & Hadoop, Sittin' in a Tree

Detect

Page 8: MongoDB & Hadoop, Sittin' in a Tree

Genetics

Page 9: MongoDB & Hadoop, Sittin' in a Tree

SUPER-FAST INTRO

Hadoop

Page 10: MongoDB & Hadoop, Sittin' in a Tree

ON HADOOP

Pig

Less code Expressive codeCompiles to MRInsulates from APIPopular (LinkedIn, Twitter, Salesforce, Yahoo, Stanford University...)

Page 11: MongoDB & Hadoop, Sittin' in a Tree

BRIEF, EXPRESSIVE

LIKE PROCEDURAL SQL

Pig

(thanks: twitter hadoop world presentation)

Page 12: MongoDB & Hadoop, Sittin' in a Tree

FOR SERIOUS

The Same Script, In MapReduce

Page 13: MongoDB & Hadoop, Sittin' in a Tree

Alternatives to Hadoop

Write MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data store

MONGODB NATIVE MAPREDUCE

Page 14: MongoDB & Hadoop, Sittin' in a Tree

Alternatives to HadoopMONGODB AGGREGATION FRAMEWORK

Great when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok

Page 15: MongoDB & Hadoop, Sittin' in a Tree

MOTIVATIONS

MongoDB + Pig

Data storage and data processing are often separate concerns

Hadoop is built for scalable processing of large datasets

Page 16: MongoDB & Hadoop, Sittin' in a Tree

SIMILAR PHILOSOPHY

MongoDB, Pig

Poly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got its

name because Pigs are omnivorous)

Page 17: MongoDB & Hadoop, Sittin' in a Tree

MortarFAST INTRO

Open-source code-based dev framework for data, built on Hadoop and Pig

Inspired by Rails

Self-contained, organized, executable projects

Page 19: MongoDB & Hadoop, Sittin' in a Tree

LOADMONGO => PIG

Mongo-Hadoop connector

LOAD 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>' USING com.mongodb.hadoop.pig.MongoLoader();

Page 20: MongoDB & Hadoop, Sittin' in a Tree

STOREPIG => MONGO

STORE result INTO 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>'USING com.mongodb.hadoop.pig.MongoStorage( 'update [key1, key2, key3]', '{key1: 1, key2: 1, key3: 1}, {unique:false, dropDups: false}');

Page 21: MongoDB & Hadoop, Sittin' in a Tree

What’s my schema?GENERATE IT

Pig is schema-optional.No schema: document#'user'#'name'With schema: user.name

Page 22: MongoDB & Hadoop, Sittin' in a Tree

What’s in the collection?CHARACTERIZE IT

Hadoop-based utility describes your collection

• Field name

• Unique value count

• Example value

• Data type

• Example value count

Page 23: MongoDB & Hadoop, Sittin' in a Tree

AppendixLINKS

Reference:

http://help.mortardata.com/reference/loading_and_storing_data/MongoDB

Mongo-Hadoop connector

https://github.com/mortardata/mongo-hadoop

Page 24: MongoDB & Hadoop, Sittin' in a Tree

@kky@mortardata

help.mortardata.com

Page 25: MongoDB & Hadoop, Sittin' in a Tree

Lunch 1:20 – 2:05 Next Sessions at 2:05 5th Floor:

West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDB

West Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4

Juilliard Complex: Business Track: Business Track: How MongoDB Helps Telefonica Digital Accelerate Time to Market

Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session

7th Floor:

Empire Complex: Real-Time Integration Between MongoDB and SQL Databases

SoHo Complex: High Performance, Scalable MongoDB in a Bare Metal Cloud


Recommended