Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | bhavesh-lodaliya |
View: | 220 times |
Download: | 0 times |
of 21
8/3/2019 Hadoop Nishant Gandhi.
1/21
HadoopIntroduction & Setup
Prepared By,Nishant M Gandhi.Certified network Manager byNettech.
Diploma in CyberSecurity(persuing).
C K Pithawalla College of Engineering & Technology,Surat
8/3/2019 Hadoop Nishant Gandhi.
2/21
What is ?
Hadoop is a framework for runningapplications on large clusters built ofcommodity hardware.
HADOOP WIKI
Open source, Java
Googles MapReduce inspired Yahoos Hadoop.
Now part of Apache group
8/3/2019 Hadoop Nishant Gandhi.
3/21
Hadoop Architecture on DELL C Series Server
8/3/2019 Hadoop Nishant Gandhi.
4/21
Hadoop Software Stack
Hadoop Common: The common utilities
Hadoop Distributed File System (HDFS): A distributed file system
Hadoop Map Reduce: Distributed processing on compute clusters.
Other Hadoop-related projects:
Avro: A data serialization system.
Cassandra: A scalable multi-master database
Chukwa: A data collection system for managing large distributed systems.
HBase: A scalable, distributed database that supports structured data storage for large tables.
Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout: A Scalable machine learning and data mining library.
Pig: A high-level data-flow language and execution framework for parallel computation.
ZooKeeper:co-ordination services
8/3/2019 Hadoop Nishant Gandhi.
5/21
Who Uses Hadoop?
lick to edit Master text stylescond level
Third level Fourth level
Fifth level
The Yahoo! Search Webmap is a Hadoopapplication that runs on more than 10,000 coreLinux cluster and produces data that is now usedin every Yahoo! Web search query.
On February 19, 2008, Yahoo! Inc. launched whatit claimed was the world's largest Hadoopproduction application
8/3/2019 Hadoop Nishant Gandhi.
6/21
Who Uses Hadoop?
The Datawarehouse Hadoop cluster at Facebook
21 PB of storage in a single HDFS cluster 2000 machines 12 TB per machine (a few machines have 24 TB each) 1200 machines with 8 cores each + 800 machines with16 cores each 32 GB of RAM per machine 15 map-reduce tasks per machine That's a total of more than 21 PB of configured storagecapacity! This is larger than the previously known Yahoo!'scluster of 14 PB. Here are the cluster statistics from theHDFS cluster at Facebook:
k to edit Master text stylesond levelThird level Fourth level
Fifth level
8/3/2019 Hadoop Nishant Gandhi.
7/21
Who Uses Hadoop?
Creater of MapReduce
Runs Hadoop for NS-Research cluster
HDFS is inspired by GFS
8/3/2019 Hadoop Nishant Gandhi.
8/21
Who Uses Hadoop?
Other Hadoop Users:
IBM
NEW YORK TIMESTwitter
Veoh
Amazon
Apple
eBay
AOL
Hewlett-Packard
Joost
8/3/2019 Hadoop Nishant Gandhi.
9/21
Map Reduce
Programming model developed at Google
Sort/merge based distributed computing
Initially, it was intended for their internalsearch/indexing application, but now used
extensively by more organizations (e.g., Yahoo,Amazon.com, IBM, etc.)
It is functional style programming (e.g., LISP) that isnaturally parallelizable across a large cluster ofworkstations or PCS.
The underlying system takes care of the partitioningof the input data, scheduling the programsexecution across several machines, handlingmachine failures, and managing required inter-machine communication. (This is the key forHadoops success)
8/3/2019 Hadoop Nishant Gandhi.
10/21
Map Reduce
8/3/2019 Hadoop Nishant Gandhi.
11/21
Hadoop Distributed File System(HDFS)
At Google MapReduce operation are run on a special file systemcalled Google File System (GFS) that is highly optimized for thispurpose.
GFS is not open source. Doug Cutting and others at Yahoo! reverse engineered the GFS and
called it Hadoop Distributed File System (HDFS).
8/3/2019 Hadoop Nishant Gandhi.
12/21
Goals of HDFS
Very Large Distributed File System 10K nodes, 100 million files, 10 PB
Assumes Commodity Hardware Files are replicated to handle hardware failure Detect failures and recovers from them
Optimized for Batch Processing Data locations exposed so that computations
can move to where data resides Provides very high aggregate bandwidth
User Space, runs on heterogeneous OS
8/3/2019 Hadoop Nishant Gandhi.
13/21
DFShellThe HDFS shell is invoked by: bin/hadoop dfs
cat chgrp chmod chown copyFromLocal copyToLocal cp du dus
expunge get getmerge
ls lsr mkdir movefromLocal mv touchz
put rm rmr setrep stat tail test text
8/3/2019 Hadoop Nishant Gandhi.
14/21
Hadoop Single Node Setup
Step 1:
Download hadoop from
http://hadoop.apache.org/mapreduce/releases.html
Step 2:
Untar the hadoop file:
tar xvfz hadoop-0.20.2.tar.gz
8/3/2019 Hadoop Nishant Gandhi.
15/21
Hadoop Single Node Setup
Step 3:
Set the path to java compiler by editing
JAVA_HOMEParameter in
hadoop/conf/hadoop-- env.sh
8/3/2019 Hadoop Nishant Gandhi.
16/21
Hadoop Single Node Setup
Step 4:
Create an RSA key to be used by hadoop whensshing to localhost:
ssh-keygen -t rsa -P
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
8/3/2019 Hadoop Nishant Gandhi.
17/21
Hadoop Single Node Setup
Step 5:
Do the following changes to the configuration files underhadoop/conf
core--site.xml:
hadoop.tmp.dirTEMPORARY-DIR-FOR-HADOOPDATASTORE
fs.default.name
hdfs://localhost:54310
8/3/2019 Hadoop Nishant Gandhi.
18/21
Hadoop Single Node Setup
mapred--site.xml:
mapred.job.tracker
localhost:54311
hdfs--site.xml:
dfs.replication
1
8/3/2019 Hadoop Nishant Gandhi.
19/21
Hadoop Single Node Setup
Step 6:
Format the hadoop file system. From hadoop
directory run the following:
bin/hadoop namenode -format
8/3/2019 Hadoop Nishant Gandhi.
20/21
Using Hadoop
1)How to start Hadoop?
cd hadoop/bin
./start-all.sh
2)How to stop Hadoop?
cd hadoop/bin
./stop-all.sh
3)How to copy file from local to HDFS?
cd hadoop
bin/hadoop dfs put local_machine_path hdfs_path
4)How to list files in HDFS?
cd hadoop
bin/hadoop dfs -ls
8/3/2019 Hadoop Nishant Gandhi.
21/21
Thank You..