+ All Categories
Transcript
Page 1: Extending Hadoop for Fun & Profit

Extending Hadoop for Fun & Profit

Milind Bhandarkar Chief Scientist, Pivotal Software,

(Twitter : @techmilind)

Page 2: Extending Hadoop for Fun & Profit

About Me• http://www.linkedin.com/in/milindb

• Founding member of Hadoop team at Yahoo! [2005-2010]

• Contributor to Apache Hadoop since v0.1

• Built and led Grid Solutions Team at Yahoo! [2007-2010]

• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)

• Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)

Page 3: Extending Hadoop for Fun & Profit

Agenda• Extending MapReduce

• Functionality

• Performance

• Beyond MapReduce with YARN

• Hamster & GraphLab

• Extending HDFS

• Q & A

Page 4: Extending Hadoop for Fun & Profit

Extending MapReduce

Page 5: Extending Hadoop for Fun & Profit

MapReduce Overview

• Record = (Key, Value)

• Key : Comparable, Serializable

• Value: Serializable

• Logical Phases: Input, Map, Shuffle, Reduce, Output

Page 6: Extending Hadoop for Fun & Profit

Map

• Input: (Key1, Value1)

• Output: List(Key2, Value2)

• Projections, Filtering, Transformation

Page 7: Extending Hadoop for Fun & Profit

Shuffle

• Input: List(Key2, Value2)

• Output

• Sort(Partition(List(Key2, List(Value2))))

• Provided by Hadoop : Several Customizations Possible

Page 8: Extending Hadoop for Fun & Profit

Reduce

• Input: List(Key2, List(Value2))

• Output: List(Key3, Value3)

• Aggregations

Page 9: Extending Hadoop for Fun & Profit

MapReduce DataFlow

Page 10: Extending Hadoop for Fun & Profit

Configuration• Unified Mechanism for

• Configuring Daemons

• Runtime environment for Jobs/Tasks

• Defaults: *-default.xml

• Site-Specific: *-site.xml

• final parameters

Page 11: Extending Hadoop for Fun & Profit

<configuration> <property> <name>mapred.job.tracker</name> <value>head.server.node.com:9001</value> </property> <property> <name>fs.default.name</name> <value>hdfs://head.server.node.com:9000</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> <final>true</final> </property>....</configuration>

Example

Page 12: Extending Hadoop for Fun & Profit

Extending Input Phase• Convert ByteStream to List(Key, Value)

• Several Formats pre-packaged

• TextInputFormat<long, Text>!

• SequenceFileInputFormat<K,V>!

• KeyValueTextInputFormat<Text,Text>!

• Specify InputFormat for each job

• JobConf.setInputFormat()

Page 13: Extending Hadoop for Fun & Profit

InputFormat

• getSplits() : From Input descriptors, get Input Splits, such that each Split can be processed independently

•<FileName, startOffset, length>!

• getRecordReader() : From an InputSplit, get list of Records

Page 14: Extending Hadoop for Fun & Profit

Industry Use Case !

Surveillance Video Anomaly Detection

Page 15: Extending Hadoop for Fun & Profit

Acknowledgements

• Victor Fang

• Regu Radhakrishnan

• Derek Lin

• Sameer Tiwari

Page 16: Extending Hadoop for Fun & Profit

Anomaly Detection in Surveillance Video

• Detect anomalous objects in a restricted perimeter

• Typical large enterprise collects TB’s video per day

• Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events

• Post-Incident monitoring enabled by Interactive Query

Page 17: Extending Hadoop for Fun & Profit

Video DataFlow

• Timestamped Video Files as input

• Distributed Video Transcoding : ETL in Hadoop

• Distributed Video Analytics in Hadoop/HAWQ

• Insights in relational DB

Page 18: Extending Hadoop for Fun & Profit

Real World Video Data

• Benchmark Surveillance videos from UK Home Office (iLids)

• CCTV Video footage depicting scenarios central to Govt requirements

Page 19: Extending Hadoop for Fun & Profit

Common Video Standards

• MPEG & ITU responsible for most video standards

• MPEG-2 (1995) Widely adopted in DVDs, TV, Set Top boxes

Page 20: Extending Hadoop for Fun & Profit

MPEG Standard Format

• Sequence of encoded video frames

• Compression by eliminating:

• Redundancy in Time: Inter-Frame Encoding

• Redundancy in Space: Intra-Frame Encoding

Page 21: Extending Hadoop for Fun & Profit

Motion Compensation

• I-Frame: Intra-Frame encoding

• P-Frame: Predicated frame from previous frame

• B-Frame: Predicted frame from both previous & next frame

Page 22: Extending Hadoop for Fun & Profit

Distributed MPEG Decoding

• HDFS splits large files in 64 MB/128 MB blocks

• Each HDFS block can be processed independently by a Map task

• Can we decode individual video frames from an arbitrary HDFS block in an MPEG File ?

Page 23: Extending Hadoop for Fun & Profit

Splitting MPEG-2

• Header Information available only once per file

• Group of Pictures (GOP) header repeats

• Each GOP starts with an I-Frame and ends with an I-Frame

• Each GOP can be decoded independently

• First and last GOP may straddle HDFS blocks

Page 24: Extending Hadoop for Fun & Profit

MPEG2InputFormat

• Derived from FileInputFormat

• getSplits() : Identical to FileInputFormat

• InputSplit = HDFS Block

•getRecordReader()!

•MPEG2RecordReader

Page 25: Extending Hadoop for Fun & Profit

MPEG2RecordReader

• Start from beginning of block

• Search for the first GOP Header

• Locate an I-Frame, decode, keep in memory

• If P-Frame, decode using last frame

• If B-Frame, keep current frame in memory, read next frame, decode current frame

Page 26: Extending Hadoop for Fun & Profit

Considerations for Input Format

• Use as little metadata as possible

• Number of Splits = Number of Map Tasks

• Combine small files

• Split determination happens in a single process, so should be metadata-based

• Affects scalability of MapReduce

Page 27: Extending Hadoop for Fun & Profit

Scalability

• If one node processes k MB/s, then N nodes should process (k*N) MB/s

• If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes

• Linear Scalability

Page 28: Extending Hadoop for Fun & Profit

Reduce LatencyMinimize Job Execution time

Page 29: Extending Hadoop for Fun & Profit

Increase ThroughputMaximize amount of data processed per unit time

Page 30: Extending Hadoop for Fun & Profit

Amdahl’s Law

S = N1+!(N !1)

Page 31: Extending Hadoop for Fun & Profit

Multi-Phase Computations

• If computation C is split into N different parts, C1..CN

• If partial computation Ci can be speeded up by a factor of Si

Page 32: Extending Hadoop for Fun & Profit

Amdahl’s Law, Restated

S =

Cii=1

N

∑Ci

Sii=1

N

Page 33: Extending Hadoop for Fun & Profit

Amdahl’s Law• Suppose Job has 5 phases: P0 is 10 seconds, P1,

P2, P3 are 200 seconds each, and P4 is 10 seconds

• Sequential runtime = 620 seconds • P1, P2, P3 parallelized on 100 machines with

speedup of 80 (Each executes in 2.5 seconds)

• After parallelization, runtime = 27.5 seconds • Effective Speedup: (620s/27.5s) = 22.5

Page 34: Extending Hadoop for Fun & Profit

MapReduce Workflow

Page 35: Extending Hadoop for Fun & Profit

Extending Shuffle

Page 36: Extending Hadoop for Fun & Profit

Why Shuffle ?

• Often, the most expensive phase in MapReduce, involves slow disks and network

• Map tasks partition, sort and serialize outputs, and write to local disk

• Reduce tasks pull individual Map outputs over network, merge, and may spill to disk

Page 37: Extending Hadoop for Fun & Profit

Message Cost Model

T = α + Nβ

Page 38: Extending Hadoop for Fun & Profit

Message Granularity

• For Gigabit Ethernet

• α = 300 μS

• β = 100 MB/s

• 100 Messages of 10KB each = 40 ms

• 10 Messages of 100 KB each = 13 ms

Page 39: Extending Hadoop for Fun & Profit

Alpha-Beta• Common Mistake: Assuming that α is constant

• Scheduling latency for responder

• MR daemons time slice inversely proportional to number of concurrent tasks

• Common Mistake: Assuming that β is constant

• Network congestion

• TCP incast

Page 40: Extending Hadoop for Fun & Profit

Efficient Hardware Platforms

• Mellanox - Hadoop Acceleration through Network-assisted Merge

• RoCE - Brocade, Cisco, Extreme, Arista...

• SSD - Velobit, Violin, FusionIO, Samsung..

• Niche - Compression, Encryption...

Page 41: Extending Hadoop for Fun & Profit

Pluggable Shuffle & Sort• Replace HTTP-based pull with RDMA

• Avoid spilling altogether

• Replace default Sort implementation with Job-optimized sorting algorithm

• Experimental APIs

• google PluggableShuffleAndPluggableSort.html

Page 42: Extending Hadoop for Fun & Profit

Mellanox UDA

• Developed jointly with Auburn University

• 2x Performance on TeraSort

• Reduces disk writes by 45%, disk reads by 15%

Page 43: Extending Hadoop for Fun & Profit

Syncsort DMX-h

Page 44: Extending Hadoop for Fun & Profit

Beyond MapReduce with YARN

Page 45: Extending Hadoop for Fun & Profit

Single'App'

BATCH

HDFS

Single'App'

INTERACTIVE

Single'App'

BATCH

HDFS

Single'App'

BATCH

HDFS

Single'App'

ONLINE

Hadoop 1.0 (Image Courtesy Arun Murthy, Hortonworks)

Page 46: Extending Hadoop for Fun & Profit

MapReduce 1.0 (Image Courtesy Arun Murthy, Hortonworks)

Page 47: Extending Hadoop for Fun & Profit

Hadoop 2.0 (Image Courtesy Arun Murthy, Hortonworks)

HADOOP 1.0

HDFS%(redundant,*reliable*storage)*

MapReduce%(cluster*resource*management*

*&*data*processing)*

HDFS2%(redundant,*reliable*storage)*

YARN%(cluster*resource*management)*

Tez%(execu7on*engine)*

HADOOP 2.0

Pig%(data*flow)*

Hive%(sql)*

%Others%(cascading)*

*

Pig%(data*flow)*

Hive%(sql)*

%Others%(cascading)*

%

MR%(batch)*

RT%%Stream,%Graph%Storm,''Giraph'

*

Services%HBase'

*

Page 48: Extending Hadoop for Fun & Profit

Applica'ons+Run+Na'vely+IN+Hadoop+

HDFS2+(Redundant,*Reliable*Storage)*

YARN+(Cluster*Resource*Management)***

BATCH+(MapReduce)+

INTERACTIVE+(Tez)+

STREAMING+(Storm,+S4,…)+

GRAPH+(Giraph)+

INLMEMORY+(Spark)+

HPC+MPI+(OpenMPI)+

ONLINE+(HBase)+

OTHER+(Search)+(Weave…)+

YARN Platform (Image Courtesy Arun Murthy, Hortonworks)

Page 49: Extending Hadoop for Fun & Profit

NodeManager* NodeManager* NodeManager* NodeManager*

Container*1.1*

Container*2.4*

NodeManager* NodeManager* NodeManager* NodeManager*

NodeManager* NodeManager* NodeManager* NodeManager*

Container*1.2*

Container*1.3*

AM*1*

Container*2.2*

Container*2.1*

Container*2.3*

AM2*

Client2*

ResourceManager*

Scheduler*

YARN Architecture (Image Courtesy Arun Murthy, Hortonworks)

Page 50: Extending Hadoop for Fun & Profit

YARN

• Yet Another Resource Negotiator

• Resource Manager

• Node Managers

• Application Masters

• Specific to paradigm, e.g. MR Application master (aka JobTracker)

Page 51: Extending Hadoop for Fun & Profit

Beyond MapReduce

• Apache Giraph - BSP & Graph Processing

• Storm on Yarn - Streaming Computation

• HOYA - HBase on Yarn

• Hamster - MPI on Hadoop

• More to come ...

Page 52: Extending Hadoop for Fun & Profit

Hamster• Hadoop and MPI on the same

cluster

• OpenMPI Runtime on Hadoop YARN

• Hadoop Provides: Resource Scheduling, Process monitoring, Distributed File System

• Open MPI Provides: Process launching, Communication, I/O forwarding

Page 53: Extending Hadoop for Fun & Profit

Hamster Components

• Hamster Application Master

• Gang Scheduler, YARN Application Preemption

• Resource Isolation (lxc Containers)

• ORTE: Hamster Runtime

• Process launching, Wireup, Interconnect

Page 54: Extending Hadoop for Fun & Profit

Resource Manager

Scheduler

AMService

Node Manager Node Manager Node Manager …

Proc/Container

Framework Daemon NS MPI

Scheduler HNP

MPI AM

Proc/Container

… RM-AM

AM-NM

RM-NodeManager Client Client-RM

Aux Srvcs

Proc/Container

Framework Daemon NS

Proc/Container

Aux Srvcs RM-

NodeManager

Hamster Architecture

Page 55: Extending Hadoop for Fun & Profit

Hamster Scalability• Sufficient for small to medium HPC

workloads

• Job launch time gated by YARN resource scheduler

Launch WireUp Collectives

Monitor

OpenMPI O(logN) O(logN) O(logN) O(logN)

Hamster O(N) O(logN) O(logN) O(logN)

Page 56: Extending Hadoop for Fun & Profit

GraphLab + Hamster on Hadoop

!

Page 57: Extending Hadoop for Fun & Profit

About GraphLab

• Graph-based, High-Performance distributed computation framework

• Started by Prof. Carlos Guestrin in CMU in 2009

• Recently founded Graphlab Inc to commercialize Graphlab.org

Page 58: Extending Hadoop for Fun & Profit

GraphLab Features• Topic Modeling (e.g. LDA)

• Graph Analytics (Pagerank, Triangle counting)

• Clustering (K-Means)

• Collaborative Filtering

• Linear Solvers

• etc...

Page 59: Extending Hadoop for Fun & Profit

Only Graphs are not Enough

• Full Data processing workflow required ETL/Postprocessing, Visualization, Data Wrangling, Serving

• MapReduce excels at data wrangling

• OLTP/NoSQL Row-Based stores excel at Serving

• GraphLab should co-exist with other Hadoop frameworks

Page 60: Extending Hadoop for Fun & Profit

Coming Soon…

Page 61: Extending Hadoop for Fun & Profit

Extending HDFS

Page 62: Extending Hadoop for Fun & Profit

HCFS

• Hadoop Compatible File Systems

• FileSystem, FileContext

• S3, Local FS, webhdfs

• Azure Blob Storage, CassandraFS, Ceph, CleverSafe, Google Cloud Storage, Gluster, Lustre, QFS, EMC ViPR (more to come)

Page 63: Extending Hadoop for Fun & Profit

New Dataset

• Reuse Namenode and Datanode implementations

• Substitute a different DataSet implementation: FsDatasetSpi, FsVolumeSpi

• Jira: HDFS-5194

Page 64: Extending Hadoop for Fun & Profit

Extending Namenode

• Pluggable Namespace: HDFS-5324, HDFS-5389

• Pluggable Block Management: HDFS-5477

• Requires fine-grained locking in Namenode: HDFS-5453

Page 65: Extending Hadoop for Fun & Profit

Questions ?


Top Related