YARN
[Yarn Another Resource Negotiator]
YARN is another important and most powerful
layer used in Hadoop eco-system. It is developed by Apache. It is implemented
in Hadoop 2.x. to overcome the MapReduce's drawbacks. Already i explained what is MapReduce, components of MapReduce
and drawbacks in my lost post. Actually before YARN coming to picture MapReduce
does the resource manager, monitoring job and task tracker and assigning job to
task tracker. For example assume that the MapReduce's single job tracker
managing more than 2000 task tracker, Then single job tracker should assign
task to all task tracker and monitoring the assigned task, another important
thing is resource allocation (Resource manager) for all task. So it makes more
complicated. To reduce MapReduce's work load or over burdened YARN introduced. MapReduce
follows batch processing to process the data. Batch process won’t perform real
time process. For example If you want to know what is current and trending post
in face book or what is most popular tweets in twitter that makes too difficult
to find using MapReduce program. But MapReduce does not support other technique except batch processing to run in hadoop. So YARN designed to support multiple
programming engin and programming to solve real time problems.YARN is next generation of MapReduce.
Components
of YARN :
Resource manager (master)
Node manager (slave)
Resource
Manager :
Resource manager acts like master. It
takes care of scheduler and application manager. It has two components called
scheduler and application manager. Scheduler allocates the resource and
application manager assign the task to node manager.
Node
Manager :
Node manager acts as slave. It receives the
task from resource manager and process with the help of data node (actual data
located place). Node manager has two components called container and
application manager.
Container and application manager
process (The task may process by MapReduce, Pig, Hive or multiple types of job
can perform in queue) the assigned task, which are stored as number of splited
blocks in data node. Any how it’s similar Map and Reduce concept.
YARN
takes care of resource manager and supports multiple programming techniques to
process the HDFS file.
Simple
Diagrammatical Representation of hadoop 1.x and 2.x for your better
understanding purpose:
I have explained basically what YARN makes different from Hadoop 1.x. If you want depth knowledge about YARN you can refer APACHE YARN site.
In my next post we will see about what are other data processing engin(Pig, Hive, Spark, Storm, Giraph and other) used in hadoop using YARN layer.
Guys hadoop is not only framework to process the big data some other framework also there like MongoDB, etc,. so please know about it also.
Thanks for reading and if you have any queries please feel free to ask. If you like this share it and give comments to make perfect.