+ All Categories
Home > Technology > Introduction to Yarn

Introduction to Yarn

Date post: 16-Apr-2017
Category:
Upload: apache-apex
View: 370 times
Download: 2 times
Share this document with a friend
15
Bhupesh Chawda [email protected] DataTorrent Introduction to YARN Next Gen Hadoop
Transcript
Page 1: Introduction to Yarn

Bhupesh [email protected]

DataTorrent

Introduction to YARNNext Gen Hadoop

Page 2: Introduction to Yarn

Image Source: https://memegenerator.net/instance/64508420

Page 3: Introduction to Yarn

Why YARN

Hadoop v1 (MR1) Architecture● Job Tracker

○ Manages cluster resources ○ Job scheduling○ Bottleneck

● Task Tracker○ Per-node Agent○ Manages tasks○ Map / Reduce task slots

MapReduce Status

Job Submission

JobTracker

Task Task

Task Task

Client

Client

TaskTracker

Task Task

Task Tracker

TaskTracker

Page 4: Introduction to Yarn

Limitations with MR1• Scalability

Maximum cluster size: 4,000 nodesMaximum concurrent tasks: 40,000

• Availability - Job Tracker is a SPOF• Resource Utilization - Map / Reduce slots• Runs only MapReduce applications

Why YARN (Cont…)

Page 5: Introduction to Yarn
Page 6: Introduction to Yarn

Introducing YARN

● YARN - Yet Another Resource Negotiator● Framework that facilitates writing arbitrary distributed processing

frameworks and applications.● YARN Applications/frameworks:

e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.

Image Source: http://tm.durusau.net/?cat=1525

Page 7: Introduction to Yarn

Hadoop beyond Batch

YARN for better resource utilization

More applications than MapReduce

Page 8: Introduction to Yarn

Comparing MapReduce with YARN

MapReduceYARN

≈8Proprietary and Confidential

Job Tracker

Resource Manager

Application Master

Task Tracker Node Manager

Map Slot

Reduce Slot

Backward Compatibility Maintained!

● Existing Map Reduce jobs run as is on the YARN framework

● No Job Tracker and Task Tracker processes

Page 9: Introduction to Yarn

• Resource ManagerManages and allocates cluster resources

Application scheduling

Applications Manager

• Node Manager

Per-machine agent

Manages life-cycle of container

Monitors resources

• Application Master

Per-application

Manages application scheduling and task execution

Hadoop v2 (YARN) Architecture

Image Source: hadoop.apache.org

Page 10: Introduction to Yarn

Application Submission workflow

YarnClient

Node RM

(ApplicationsManagers + Scheduler)

Resource Manager

Node NM

Node Manager

Node NM

Node ManagerApplication

Master

ContainerContainer

1) Submit application

2) Launch application Master

RM = Resource ManagerNM = Node ManagerAM = Application Master = Heartbeats

3) AM registers with RM

4) AM negotiates for containers

5) Launch Container

Page 11: Introduction to Yarn

Application Masters - One for each Application Type

MapReduce Application MapReduce Application Master

Apex ApplicationApex

Application Master (StrAM)

Flink Application Flink Application Master

Giraph Application Giraph Application Master

Already provided by Hadoop as a backward compatibility option for MapReduce

Provided by Apache Apex

Page 12: Introduction to Yarn

● YARN enables non-MapReduce applications to run in a distributed fashion● Each Application first asks for a container for the Application Master

○ The Application Master then talks to YARN to get resources needed by the application

○ Once YARN allocates containers as requested to the Application Master, it starts the application components in those containers.

● Hadoop is no more just batch processing!!

Key Takeaways

Page 13: Introduction to Yarn
Page 14: Introduction to Yarn

References● Simple Yarn code example

○ https://github.com/hortonworks/simple-yarn-app

● Document references○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/○ http://www.slideshare.net/

● Acknowledgements○ Priyanka Gugale, DataTorrent - Slide deck


Recommended