+ All Categories
Transcript
Page 1: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Analysis of Data Placement Strategy based on Computing Power of Nodes onHeterogeneous Hadoop Clusters

Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin

Page 2: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Presentation Overview

● Synopsis● Mapreduce Programming Model Overview● HDFS Overview● Motivation● Design● Software Description● Hardware Description● Results● Conclusion

Page 3: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Synopsis

● Data placement strategy● Heterogeneous Clusters● Computing Power● Calculating Computing Ratio● WordCount and Grep

Page 4: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

MapReduce Model

● Hadoop 1.0 and Hadoop 2.0● Master - Slave Model● JobTracker and TaskTracker Hadoop 1.0● YARN Hadoop 2.0● Resource Manager YARN● Application Manager YARN● Node Manager YARN● MapReduce Flow

Page 5: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Mapreduce Model

Page 6: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Mapreduce Model - 1.0

Page 7: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Mapreduce Model - YARN - 2.0

Page 8: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Mapreduce Model - Flow

Page 9: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

HDFS

● Namenode● Datanode● Replication● Federated Namenodes

Page 10: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

HDFS Architecture

Page 11: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

HDFS Federated Namenodes

Page 12: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

HDFS Federated Namenodes

● Scalability● Performance● Isolation - overload

Page 13: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Motivation

Page 14: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Software Description

● Hadoop 2.3.0● Maven● Eclipse● Protocol Buffers

Page 15: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Hardware Description

Page 16: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Design

Run WordCount and Grep Applications on individual nodes

Page 17: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Design

Calculate Computing Power of Individual Nodes fora specific application

Page 18: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Design

● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes

● Run the CRBalancer to balance the nodes● Finally re-run the applications to note the ramifications

of the data placement strategy.

Page 19: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Design - Algorithm

CRBalancer Strategy

Page 20: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Implementation

● CRBalancer ● CRBalancingPolicy● CRNamenodeConnector

Page 21: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Results - WordCount

Page 22: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Results - Grep

Page 23: HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Questions ??


Top Related