+ All Categories
Home > Technology > Hadoop - Simple. Scalable.

Hadoop - Simple. Scalable.

Date post: 15-May-2015
Category:
Upload: elliando-dias
View: 1,813 times
Download: 1 times
Share this document with a friend
Popular Tags:
44
Hadoop Simple. Scalable.
Transcript
Page 1: Hadoop - Simple. Scalable.

Hadoop

Simple. Scalable.

Page 2: Hadoop - Simple. Scalable.

@markgunnels

[email protected]

Page 3: Hadoop - Simple. Scalable.

Java. Clojure. Ruby.

Cloudera Certified

Page 4: Hadoop - Simple. Scalable.

posscon.org

April 15, 16, and 17

Page 5: Hadoop - Simple. Scalable.

Agenda

OverviewMassively Large Data Sets and the problems thereinDistributed File SystemMapReducePig

Page 6: Hadoop - Simple. Scalable.

Overview

Page 7: Hadoop - Simple. Scalable.

Doug Cutting

Genius

Page 8: Hadoop - Simple. Scalable.

Favorite Hadoop Story

New York Times

Page 9: Hadoop - Simple. Scalable.

4 Terabytes of Source Articles.

Page 10: Hadoop - Simple. Scalable.

24 Hours.

Page 11: Hadoop - Simple. Scalable.

5.5 Terabytes of PDFs.

Page 12: Hadoop - Simple. Scalable.

Did it again.

Page 13: Hadoop - Simple. Scalable.

$240.

Page 14: Hadoop - Simple. Scalable.

Infoporn from Yahoo

73 hours490 TB Shuffling280 TB Output4000 Nodes16 PB Disk Space32K Cores64 TB RAM

Page 15: Hadoop - Simple. Scalable.

Hadoop solves...

Page 16: Hadoop - Simple. Scalable.

Analyzing Massively Large Datasets

Page 17: Hadoop - Simple. Scalable.

Two Problems

You have to distribute.

Page 18: Hadoop - Simple. Scalable.

Data Storage

Capacity has increased rapidly beyond read speeds. Datasets

won't fit on one disk. Tolerate node failure.

Page 19: Hadoop - Simple. Scalable.

Data Analysis

Combine data from many machines. Tolerate node failure.

Page 20: Hadoop - Simple. Scalable.

How Hadoop solves these problems.

Page 21: Hadoop - Simple. Scalable.

Send Code to Data. Not Data to Code.

Page 22: Hadoop - Simple. Scalable.

Data Storage

HDFS

Page 23: Hadoop - Simple. Scalable.

Name Node. Data Nodes.

Master - Slave Relationship

Page 24: Hadoop - Simple. Scalable.

Shard massive files across multiple machines.

MB, GB, and TB

Page 25: Hadoop - Simple. Scalable.

Tolerant of Node Failure

Files replicated across at least 3 nodes.

Page 26: Hadoop - Simple. Scalable.

HDFS behaves like a normal file system.

No true appends yet.

Page 27: Hadoop - Simple. Scalable.

Demonstration.

Page 28: Hadoop - Simple. Scalable.

Data Analysis

MapReduce

Page 29: Hadoop - Simple. Scalable.

Job Tracker. Task Nodes.

Master - Slave Relationship.

Page 30: Hadoop - Simple. Scalable.

map

Page 31: Hadoop - Simple. Scalable.

Demonstration

Page 32: Hadoop - Simple. Scalable.

pmap

Page 33: Hadoop - Simple. Scalable.

Demonstration

Page 34: Hadoop - Simple. Scalable.

reduce

Page 35: Hadoop - Simple. Scalable.

Demonstration

Page 36: Hadoop - Simple. Scalable.

(reduce (pmap))

Page 37: Hadoop - Simple. Scalable.

Demonstration.

Page 38: Hadoop - Simple. Scalable.

MapReduce

Java

Page 39: Hadoop - Simple. Scalable.

Nobody likes it.

:-)

Page 40: Hadoop - Simple. Scalable.

MapReduce

Ruby. Python. Unix Utilities.

Page 41: Hadoop - Simple. Scalable.

MapReduce

Clojure

Page 42: Hadoop - Simple. Scalable.

Hadoop Ecosystem

Pigkeeper. Hive. Cascading.

Page 43: Hadoop - Simple. Scalable.

Pig

Page 44: Hadoop - Simple. Scalable.

HBase


Recommended