Post on 29-Jun-2015
description
transcript
Analysis of Data Placement Strategy based on Computing Power of Nodes onHeterogeneous Hadoop Clusters
Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin
Presentation Overview
● Synopsis● Mapreduce Programming Model Overview● HDFS Overview● Motivation● Design● Software Description● Hardware Description● Results● Conclusion
Synopsis
● Data placement strategy● Heterogeneous Clusters● Computing Power● Calculating Computing Ratio● WordCount and Grep
MapReduce Model
● Hadoop 1.0 and Hadoop 2.0● Master - Slave Model● JobTracker and TaskTracker Hadoop 1.0● YARN Hadoop 2.0● Resource Manager YARN● Application Manager YARN● Node Manager YARN● MapReduce Flow
Mapreduce Model
Mapreduce Model - 1.0
Mapreduce Model - YARN - 2.0
Mapreduce Model - Flow
HDFS
● Namenode● Datanode● Replication● Federated Namenodes
HDFS Architecture
HDFS Federated Namenodes
HDFS Federated Namenodes
● Scalability● Performance● Isolation - overload
Motivation
Software Description
● Hadoop 2.3.0● Maven● Eclipse● Protocol Buffers
Hardware Description
Design
Run WordCount and Grep Applications on individual nodes
Design
Calculate Computing Power of Individual Nodes fora specific application
Design
● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes
● Run the CRBalancer to balance the nodes● Finally re-run the applications to note the ramifications
of the data placement strategy.
Design - Algorithm
CRBalancer Strategy
Implementation
● CRBalancer ● CRBalancingPolicy● CRNamenodeConnector
Results - WordCount
Results - Grep
Questions ??