SAS Grid Manager for Hadoop nicely tied into YARN (Part 2)


stones represent load balancing and SLAsIn Part 1 of this series, Cheryl Doninger described how SAS Grid Manager can extend your investment in the Hadoop infrastructure. In this post, we’ll take a look at how Cloudera Manager helps Hadoop administrators meet competing service level agreements (SLAs).

Cloudera Manager lets Hadoop admins set up queues to meet competing SLAs, and it enables them to manage the queues in a visual, intuitive way. For example, admins can change a queue’s configuration based on the priority of the job or the time of day. And they can rebalance based on new hardware configurations or as more load is added to the cluster.

With SAS Grid Manager for Hadoop, users can define YARN queue assignments. The default scheduler in Cloudera Manager is the Fair Scheduler. Fair Scheduler promotes fairness between competing jobs. So, if it is set up correctly, no job will have to wait too long for resources, and available resources will not be idle when existing jobs need the additional resources.

Users can also submit Hive jobs using SAS Grid Manager for Hadoop. The Hive jobs will have different SLAs, so the Hadoop administrator can create different YARN queues for it. This prevents one SAS job from taking over all the resources on the cluster.

A queue example

Let’s imagine a scenario where the marketing team has created a YARN queue in Cloudera Manager that has 35GB of memory and contains three sub-queues:   

  1. Real-time/Analysis queue. For ad hoc querying, these are the jobs that need to be processed quickly. Therefore, the queue’s minimum resources are set to 20GB. This queue will be guaranteed 20GB if it’s available.
  2. Batch. This is for long-running jobs. These are usually done at night, but if a batch job runs during the day, it will have lower priority than the analysis queue. No minimum resources are defined.
  3. Exploration. This is for R&D to do testing. No minimum resources are specified, but this sub-queue has a weight of 2.

In this example, the analysis queue has 20GB of resources, and the remaining 15GB will be distributed between the batch and exploration queues. Since the exploration queue has a weight of 2 and the batch queue has weight of 1 (default value), the exploration queue will receive twice as much as the batch queue. The exploration queue gets 10GB and the batch queue gets 5GB. These values can be configured differently for weekdays versus weekends, or for working hours versus non-working hours. For example, on the weekend, the batch queue can have a weight of 4 while the exploration queue gets a weight of 1. (Want to know more? Learn how to configure different options for Cloudera Manager.)

Hadoop report by TDWI
Hadoop report by TDWI

These are just a few examples of different Fair Scheduler settings. As more SAS jobs are added to the system, the admin can use Cloudera Manager to modify queues to meet the new demands. Along with memory, the admin can specify CPU settings. No server restart is needed.

These activities are all transparent to the application – so, as more CPUs and memory are added, the Fair Scheduler will adjust accordingly. In the scenario above, for example, if no jobs in the analysis or exploration queues are running, then the batch jobs will use all the resources currently available. As more SAS jobs are created, the admin can create more queues to rebalance the cluster – and that is also transparent to existing applications.

YARN is a big improvement over MR1. The reason? In MR1, just one memory-intensive application like a batch job could cause analysis or exploration jobs to starve – or worse yet, fail – due to resources not being available.

Once all queues are set up, the Hadoop administrator can continue to monitor resources to make sure the cluster performs as expected as additional SAS jobs are added. To support monitoring, Cloudera Manager provides visual tools for looking at cluster health. Dig deeper if you’d like to learn more about the metrics options in Cloudera Manager.

Learn more about SAS® Grid Manager for Hadoop.

For more information about Fair Scheduler, visit Cloudera’s documentation section.


About Author

Farzana Kader

Senior Solutions Architect

Farzana Kader is a Sr. Solutions Architect at Cloudera

Leave A Reply

Back to Top