Monday, 4 August 2014

What is Yarn In Hadoop architecture




YARN [Yarn Another Resource Negotiator]
       
        YARN is another important and most powerful layer used in Hadoop eco-system. It is developed by Apache. It is implemented in Hadoop 2.x. to overcome the MapReduce's drawbacks. Already i explained what is MapReduce, components of MapReduce and drawbacks in my lost post. Actually before YARN coming to picture MapReduce does the resource manager, monitoring job and task tracker and assigning job to task tracker. For example assume that the MapReduce's single job tracker managing more than 2000 task tracker, Then single job tracker should assign task to all task tracker and monitoring the assigned task, another important thing is resource allocation (Resource manager) for all task. So it makes more complicated. To reduce MapReduce's work load or over burdened YARN introduced. MapReduce follows batch processing to process the data. Batch process won’t perform real time process. For example If you want to know what is current and trending post in face book or what is most popular tweets in twitter that makes too difficult to find using MapReduce program. But MapReduce does not support other technique except batch processing to run in hadoop. So YARN designed to support multiple programming engin and programming to solve real time problems.YARN is next generation of MapReduce.


Components of  YARN :

        Resource manager (master)
        Node manager (slave)

Resource Manager :
       
        Resource manager acts like master. It takes care of scheduler and application manager. It has two components called scheduler and application manager. Scheduler allocates the resource and application manager assign the task to node manager.

Node Manager :
       
        Node manager acts as slave. It receives the task from resource manager and process with the help of data node (actual data located place). Node manager has two components called container and application manager.
       
        Container and application manager process (The task may process by MapReduce, Pig, Hive or multiple types of job can perform in queue) the assigned task, which are stored as number of splited blocks in data node. Any how it’s similar Map and Reduce concept.

YARN takes care of resource manager and supports multiple programming techniques to process the HDFS file.
       
Simple Diagrammatical Representation of hadoop 1.x and 2.x for your better understanding purpose:


I have explained basically what YARN makes different from Hadoop 1.x. If you want depth knowledge about YARN you can refer APACHE YARN site.

In my next post we will see about what are other data processing engin(Pig, Hive, Spark, Storm, Giraph and other) used in hadoop using YARN layer.
 Guys hadoop is not only framework to process the big data some other framework also there like MongoDB, etc,. so please know about it also. 

Thanks for reading and if you have any queries please feel free to ask. If you like this share it and give comments to make perfect.