Post on 15-Aug-2015
transcript
Source: Intel
Source: Google data centers
$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt
$>
HDFS
$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt
$> hadoop fs -ls dir
HDFS
$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt
$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt
$>
HDFS
$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt
$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt
$> hadoop fs -cat dir/file3.txt
HDFS
$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt
$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt
$> hadoop fs -cat dir/file3.txtline1line2line3line4line5
HDFS
MapReduce
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
MapReduce
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing
Map
Map
Map
Map
Reduce
Reduce
Reduce
MapReduce Job
Split
Split
Split
Split
map(){ // Your code here}
Map
Map
Map
Map
Reduce
Reduce
Reduce
MapReduce Job
Split
Split
Split
Split
map(){ // Your code here}
reduce(){ // Your code here}
Map
Map
Map
Map
Reduce
Reduce
Reduce
MapReduce Job
Split
Write
Write
Write
Split
Split
Split
map(){ // Your code here}
reduce(){ // Your code here}
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing Job Job
The Big Data problemData Pipeline
Application
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing Job Job
The Big Data problemData Pipeline
Application
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing Job Job
The Big Data problemData Pipeline
Application
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing Job Job
The Big Data problemData Pipeline
Application
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
Hardware
Storage
Processing Job Job
The Big Data problemData Pipeline
Application
NodeJobTracker
NodeTaskTracker
MapReduce 1.0 Architecture
NodeTaskTracker
NodeTaskTracker
NodeTaskTracker
MapReduce 1.0 Architecture
NodeTaskTracker
Map Map Map Map
Map Map Map Reduce
Reduce Reduce Reduce Reduce
NodeJobTracker
NodeTaskTracker
MapReduce 1.0 Architecture
NodeTaskTracker
NodeTaskTracker
NodeTaskTracker
Application
NodeJobTracker
NodeTaskTracker
MapReduce 1.0 Architecture
NodeTaskTracker
NodeTaskTracker
NodeTaskTracker
Application
Map
NodeJobTracker
NodeTaskTracker
MapReduce 1.0 Architecture
NodeTaskTracker
NodeTaskTracker
NodeTaskTracker
Application
Reduce
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemYARN - Yet Another Resource Negotiator
Processing
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
NodeManager
Applicationx61 core1024MB
x8 x8
x8 x8
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
Container
Container
Container
NodeManager
Container
Container
Applicationx61 core1024MB
Container
x8 x8
x8 x8
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
Map
Map
Map
NodeManager
Map
Map
Applicationx61 core1024MB
Map
x8 x8
x8 x8
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
Reduce
Reduce
Reduce
NodeManager
Reduce
Reduce
Applicationx61 core1024MB
Reduce
x8 x8
x8 x8
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
Container
Container
Container
NodeManager
Container
Container
Container
Applicationx61 core1024MB
Application 2x42 cores2048MB
x8 x8
x8 x8
NodeManagerResourceManager
YARN Architecture
ApplicationMaster
Container
Container
Container
NodeManager
Container
Container
ApplicationMaster
Container
Applicationx61 core1024MB
Application 2x42 cores2048MB
x8 x8
x8 x8
NodeManager
Container
ResourceManager
YARN Architecture
Container
ApplicationMaster
Container
Container
Container
NodeManager
Container
Container
Container
ApplicationMaster
Container
Container
Applicationx61 core1024MB
Application 2x42 cores2048MB
x8 x8
x8 x8
NodeManager
Container
ResourceManager
YARN Architecture
Container
ApplicationMaster
Container
Container
Container
NodeManager
Container
Container
Container
ApplicationMaster
Container
Container
Applicationx61 core1024MBetl
Application 2x42 cores2048MBquery
x8 x8
x8 x8
scheduleretl: weight 1
query: weight 2
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemNew Paradigms
Processing
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemNew Paradigms
Processing
Application
Batch
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemNew Paradigms
Processing
Application
In Memory / Streaming
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemNew Paradigms
Processing
Application
Interactive SQL
Node Node Node Node Node Node Node Node
HDFS - Hadoop Distributed File System
YARN
Hardware
Storage
Resource Manager
The Big Data problemNew Paradigms
Processing ...
Application
Trovit
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Search engine
Trovit
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Business Intelligence
Search engine
Trovit
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Business Intelligence
Search engine
Mailing
Trovit
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Business Intelligence
Search engine
Mailing Push Notifications
Trovit
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Business Intelligence
Search engine
Mailing Push Notifications
Online Media Buying
Challenges
Trovit
Maintain
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Challenges
Trovit
Maintain Try new paradigms
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Challenges
Trovit
Maintain Try new paradigms
Fine tune
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Trovit
Data Analysis with SQL on Hadoop
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Trovit
Data Analysis with SQL on Hadoop
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
HiveSQL M/R
Trovit
Data Analysis with SQL on Hadoop
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
HiveSQL M/R
Sqoop onMySQL
Challenges
Trovit
Data Analysis with SQL on Hadoop
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
ImpalaInteractive
Challenges
Trovit
Data Analysis with SQL on Hadoop
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
ImpalaInteractive
Machine Learning
Trovit
Data Analysis with SQL on Hadoop
Near Real Time on a Storm cluster
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Separated Cluster
Challenges
Trovit
Data Analysis with SQL on Hadoop
Near Real Time on a Storm cluster
+70 MapReduce Jobs adding business value
Multi-tenant cluster executing +7000 jobs per day
Storm on YARN
Thank YouFerran Galí i Reniu
@ferrangali
Icons made by Freepik from Flaticon is licensed by CC BY 3.0