Date post: | 11-Jan-2017 |
Category: |
Data & Analytics |
Upload: | caizerx |
View: | 87 times |
Download: | 1 times |
HDFS
Meet Hadoop Family: part 1
• What is it? Distributed file system, designed to store very large files with streaming data access patterns
• Why it is needed? Very large fileStreaming data accessCommodity hardware
• Traditional design limitsRAC, MPP, brings data to computation, network become bottleneck
• Trade-offsHigh latency data accessNot good for lot of small filesWrite once, not support multiple write
A Client Reading Data From HDFS
A Client Write Data to HDFS
Network Distances in Hadoop
• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node) • distance(/d1/r1/n1, /d1/r1/n2) = 2 (different nodes on the same rack) • distance(/d1/r1/n1,/d1/r2/n3) = 4 (nodesondifferentracksinthesamedatacenter) • distance(/d1/r1/n1, /d2/r3/n4) = 6 (nodes in different data centers)
• HDFS blocks, default size 128 mb (for a reason), default replication 3x
• Name Node, stores metadata of all blocks in the clusters, location configuration dfs.namenode.name.dir, default /dfs/xx
• Data nodes, store data blocks, also has metadata related to local blocks
• POSIX like (almost) permissions, rw(x), users, groups, mode
• HDFS logs and web Interface, port 50070, port 50075
• WebHDFS/ HTTPFS REST interface http://sabtu:50070/webhdfs/v1/tmp?user.name=hdfs&op=GETFILESTATUS {"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16386,"group":"supergroup","length":0,"modificationTime":1467099643710,"owner":"hdfs","pathSuffix":"","permission":"1777","replication":0,"type":"DIRECTORY"}}
• High Availability mode
• HDFS federation, similar concept with namespace / database sharding
• HDFS balancer
• Safe mode
• Distributed copy (distcp)
Some Features
HDFS Federation
• start cluster $HADOOP_PREFIX_HOME/bin/start-dfs.sh
• stop cluster$HADOOP_PREFIX_HOME/bin/stop-dfs.sh
• file operations hdfs dfs -cp x yhdfs dfs -ls x hdfs dfs -cat x hdfs dfs -put x y hdfs dfs -get x y
Common Commands
Questions?https://www.meetup.com/Jakarta-Hadoop-Big-Data/