Date post: | 02-Nov-2014 |
Category: |
Technology |
Upload: | korea-sdec |
View: | 1,194 times |
Download: | 1 times |
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.Copyright for all other & referenced work is retained by their respective owners.
Introducing HadoopMastering Hadoop Map-reduce for Data Analysis
Shashank Tiwariblog: shanky.org | twitter: @[email protected]
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
What is Hadoop
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
HDFS Architecture
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Namenode/Datanode, JobTracker/TaskTracker
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
ZK Namespace
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Essential HBase Schema
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Multi-dimensional View
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
A Map/Hash View
• {
• "row_key_1" : { "name" : {
• "first_name" : "Jolly", "last_name" : "Goodfellow"
• } } },
• "location" : { "zip": "94301" },
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Architectural View (HBase)
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
The Persistence Mechanism
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
The underlying file format
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Installing & Setting up Hadoop
• Required software: Java 1.6.x, ssh + sshd
• Download
• Install
• Configure
• single-node
• pseudo-distributed
• cluster
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Download
• Source: http://hadoop.apache.org/
• Version:
• 0.20.203.x -- current stable
• 0.20.x -- previous stable
• Includes
• Hadoop Common -- common utilities, HDFS, MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Install
• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz
• Move & Create Symbolic Link
• ln -s hadoop-0.20.203.0 hadoop
• On Windows
• http://developer.yahoo.com/hadoop/tutorial/module3.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Configure -- single-node
• Edit: conf/hadoop-env.sh
• Set JAVA_HOME
• Default configuration is single-node
• Start bin/hadoop (for command options)
• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Configure -- pseduo-distributed
• Edit: conf/core-site.xml (configure HDFS daemon)
• Edit: conf/hdfs-site.xml (configure HDFS replication factor)
• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)
• Enable ssh to localhost (without passphrase)
• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Start Hadoop
• Format HDFS: bin/hadoop namenode -format
• Start all daemons: bin/start-all.sh
• Verify logs
• Browse the web interface:
• Namenode: http://localhost:50070/
• JobTracker: http://localhost:50030/
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Take Hadoop for a test-drive
• Run examples (hadoop-examples-0.20.203.0.jar)
• Grep using regular expressions
• Copy files to HDFS: bin/hadoop fs -put bin input
• Grep for files which have text beginning with ‘start’
• Verify output on HDFS: bin/hadoop fs -cat output/*
• Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Configure -- cluster
• References:
• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation)
• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN)
• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLCAll other & referenced work is copyrighted to their respective owners
Questions?
• blog: shanky.org | twitter: @tshanky