SAS Grid Manager for Hadoop nicely tied into YARN (Part 1)

2

business people at computerIf you'd like to extend your investment in the Hadoop infrastructure, SAS Grid Manager for Hadoop can help by enabling you to colocate SAS Grid jobs on your Hadoop data nodes. It works because SAS Grid Manager for Hadoop – which is Cloudera certified – is integrated with the native components of your Hadoop ecosystem, specifically YARN and Oozie.

How does it work? The SAS Grid Manager for Hadoop conceptual architecture diagram shown below illustrates the various tiers in a complete SAS deployment. Note that the Cloudera Hadoop cluster and the SAS Grid components are colocated on the same hardware, making this both a data and a server tier. Kerberos is also a required component in this environment.

Cheryl Cloudera diagram 1

The next diagram outlines the steps that happen from Point A (a SAS client application makes a request to SAS Grid Manager for Hadoop to run a SAS Grid job) to Point B (YARN runs the SAS Grid job in a YARN container). In this diagram, the blue components are authored by SAS and the beige components are part of the YARN ecosystem. The SAS Grid Manager for Hadoop module and SAS YARN AppMaster are all part of the SAS Grid Manager for Hadoop product. Let's review the details of the integration between SAS Grid Manager for Hadoop and YARN.

Cheryl Cloudera diagram 2

The following steps correspond to the numbers on the diagram above:

  1. A SAS client submits a SAS job (SASGSUB, CONNECT, grid-launched workspace server) to the SAS Grid Manager for Hadoop module.
  2. The SAS Grid Manager for Hadoop module communicates with the YARN resource manager to invoke the SAS YARN application master.
  3. The SAS YARN application master requests a YARN container with the specified resources. The resources are specified in a grid policy file read by SAS Grid Manager for Hadoop. When the YARN Resource Manager determines which node has the resources, the application master requests a YARN container on the grid node that has the available resources.
  4. YARN runs the command it was given to invoke SAS under the control of the YARN container.
  5. Once the SAS Grid job has started, the remaining SAS Grid behavior is unchanged. YARN knows which resources are used by the SAS Grid job, allowing you to build a multitenant or shared Hadoop cluster.
Hadoop report by TDWI
Hadoop report by TDWI

The bottom line: SAS Grid Manager for Hadoop lets organizations manage diverse workloads on a shared Hadoop cluster. By using YARN, long processes can run with batch and ad hoc processes without contending for resources. But each workload has different service level agreements (SLAs). So, during the day, a nightly batch job run should not stop an important real-time analysis job from running and should not significantly hamper exploration jobs performed by R&D.


Watch for the next post in this series to hear from Farzana Kader about how Cloudera Manager and YARN can be used to manage these different workloads.

Tags grid hadoop
Share

About Author

Cheryl Doninger

Sr Director, Research and Development

Cheryl Doninger is a Senior Director in SAS’ Research and Development division. She is responsible for guiding development of a variety of SAS server and compute technologies as well as optimizing the ways in which SAS software interacts with IT. Under Cheryl’s leadership, SAS works with strategic partners to incorporate technology advances with SAS software to improve the performance and scalability of all SAS products and solutions. Since joining SAS in 1986, Cheryl has played a key role in setting the company’s technical direction in scalability, grid and cloud computing. She is also a multi-patent holder in the area of grid computing. Cheryl earned a bachelor's degree in Computer Science from Bowling Green State University and a master's degree in Computer Studies from North Carolina State University.

2 Comments

Leave A Reply

Back to Top