Date post: | 18-Jul-2015 |
Category: |
Education |
Upload: | siva-sankar |
View: | 410 times |
Download: | 1 times |
BigDataBigData
An Introduction by An Introduction by KeylabsKeylabs
Need For A New Processing Platform (BigData)
What is BigData ? - Twitter (over 7~ TB/day) - Facebook (over 10~ TB/day) - Google (over 20~ PB/day)
Where does it come from ?
Existing systems (vertical scalibility)
Why Hadoop (horizontal scalibility)?
Origin of Hadoop
Companies Using Hadoop
Yahoo Google Facebook LinkedIn IBM Amazon HortonWorks Cloudera NY Times … the list goes on.
What is Hadoop? Flexible infrastructure for large scale computation & data
processing on a network of commodity hardware.
Completely written in java.
Open source & distributed under Apache license
Hadoop Core Components: HDFS & MapReduce.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
What Hadoop is Not?
A File system
A database
An online transaction processing (OLTP) system
Replacement of all programming logic
Three Vs of Hadoop and counting…
Hadoop Introduction and Architecture
Hadoop High-Level Architecture
Hadoop Architecture
Admin Node
Job Tracker
Name Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
MapReduce Engine
HDFS Cluster
Hadoop Cluster
Distributed File System Hadoop Distributed File System
Read 1TB Data
1 Machine•4 I/O Channels•Each Channel – 100MB/s
10 Machines•4 I/O Channels•Each Channel – 100MB/s
What’s so Special About Open Source Hadoop?
HDFS - Hadoop Distributed File System
Design of HDFS Where HDFS is not a good fit Why Is a Block in HDFS So Large? Advantage of HDFS?
HDFS is not for. Low Latency Data Access
Large number of small files.
Multiple writers, arbitrary file modifications.
HDFS Architecture
Let us Zoom into HDFS
NameNode
Deeper Things about Name NodeRequest to note down these points
DataNode
What is DataNode?
NameNode and DataNodes
Data Replication
What is Data Replication
Data Replication & Rack Awareness
File Write Operation
File Write Operation
A client writing the data to HDFS
File Write Operation in Depth - 1
File Write Operation in Depth - 2
File Write Operation in Depth - 3
File Write Operation in Depth - 4
File Write Operation in Depth - 4
File Write Operation – Unhappy Path
File Read Operation
File Read Operation
A client reading data from HDFS
File Read Operation in Depth - 1
File Read Operation in Depth - 2
File Read Operation in Depth - 3
File Read Operation - Unhappy Path
Secondary NameNode
Hadoop Cluster – A Typical Scenario
Hadoop Ecosystem
Data Loading Techniques and Analysis
When should we go for Hadoop? Data is too huge
Processes are independent
Online analytical processing (OLAP)
Better scalability
Parallelism
Unstructured data
THANK YOUTHANK YOUFOR YOURFOR YOUR
ATTENTION!ATTENTION!