So, with the simple introduction in Understanding Hadoop security, configuring Kerberos with Hadoop alone looks relatively straightforward. Your Hadoop environment sits in isolation within a separate, independent Kerberos realm with its own Kerberos Key Distribution Center. End users can happily type commands as they log into a machine hosting the Hadoop clients. From the host machine they can run processing against the Hadoop services.
But how does SAS fit into this picture? Where will the SAS servers and clients be located in relation to the Hadoop Kerberos realm? This post provides more insight into second of the four key practices for securing a SAS-Hadoop environment:
Simplify Kerberos setup by placing SAS and Hadoop within the same topological realm.
After reading this next blog post, a coworker told me botanists have a term that fits this concept perfectly: monoecious, from the Greek meaning “one household”. Some trees like hollies and ginkos have male and female flowers on separate plants, but for most plants, the connections of life are made much simpler by being monoecious, by ensuring the important elements are in close proximity. Here’s why that works for SAS-Hadoop-Kerberos too!
What happens if SAS and Hadoop are in different realms
It’s unlikely that many SAS and Hadoop environments will be installed at the same time. Often one or more already exists. If you have an SAS existing environment in your corporate realm and you’ve just followed the instructions from your Hadoop provider for configuring Kerberos, you’ll probably have the setup in Figure 1. SAS server and user authentication will happen in the corporate realm, while access to the Hadoop realm is governed by the Kerberos Key Definition Center and will happen in the Hadoop realm.
However, the major thing missing from the customer’s environment is reflected in the green arrow at the top. In the diagram below, the Corporate Domain and the new Hadoop Realm contain the trust relationships. A domain administrator must create these trusts by mapping users between the two realms. Without one-way trust, SAS is not going to be able to interact with Hadoop at all. This topology will be one of the more complex arrangements. SAS administrators and their IT departments will need to set up all the required domain trusts represented by that little green arrow.
Once trusts are established, there are additional steps to ensure back-end Kerberos authentication for SAS processes running in the Corporate Realm. Ideally, to access Hadoop Services while running SAS processes, the operating system should be configured to perform the kinit step to obtain the correct Ticket Granting Ticket (TGT). Unless the operating system is given this capability, the SAS processes will be unable to request the Service Ticket and so will be unable to authenticate.
The simplest option for SAS administrators is to perform this step on the host running the SAS process as part of the session initialization. In this instance, the SAS session will be launched normally. For example, within an Enterprise Guide session, the end-user still enter a valid user name and password into the connection profile. This action sets up a back-end Kerberos authentication between the SAS process and the Hadoop Services.
Placing SAS and Hadoop in the same realm
Now an alternative to setting up the domain trusts above would be to move the SAS Servers and SAS High Performance Analytics nodes into the same “household” as the Hadoop Key Distribution Center, as shown here in Figure 2. In this configuration, the end-user logs into the corporate realm and launches a SAS session by entering a user name and password into a SAS client. The same credentials used to start SAS Enterprise Guide, for example, are also valid in the Hadoop realm.
Authentication now takes place in the joint SAS-Hadoop realm without additional mapping required. The SAS servers and SAS High-Performance Analytics nodes can interact with the same Kerberos Key Distribution Center as the Hadoop services because all the components are within the same Kerberos realm.
This topology will greatly simplify the Kerberos setup for the SAS components. The Kerberos authentication within the Hadoop Realm will be straightforward, and the only complexity will be if the customer has a requirement for end-to-end Kerberos authentication in which the SAS session itself is launched using Kerberos and Kerberos authentication from the user’s desktop through to the Hadoop services.
Where to find more information
SAS provides architecture documents that offer guidelines for ensuring your SAS-Hadoop environment is not only secure, but also offers faster response times.
- Cross-Realm Trust Interoperability, MIT Kerberos and AD – a Red Hat paper discussing issues involved.
- Hadoop with Kerberos—Architecture Considerations – SAS best practices that include a checklist of pre-installation questions.