Help, I lost my SAS server again!

In my last post, I introduced the hardware solutions (such as a virtual IP switch or IP load balancer) that enable client applications to access services regardless of whether they are running on a primary or a failover server in a grid-enabled environment configured with high availability. In this post, I’ll detail the use of DNS resolution to ensure access to SAS servers.

About DNS resolution

Every client uses DNS resolution to find the IP address from the name of the server where it knows a service is running. In a high-availability scenario, the environment is usually configured to use aliases instead of real server names, such as meta_alias.exnet.xyz.com instead of sgcwin071.exnet.xyz.com or sgcwin072.exnet.xyz.com in the graphic below.

The corporate DNS does not know on which of the two or more possible hosts SAS services are running (we have no hardware load balancer here) so the software solution requires some means of integrating with the corporate DNS to return the correct IP address.

With SAS Grid Manager, it is EGO itself that does this, or more specifically, a component called EGO Service Director, and it can do return the correct IP address in a couple of different ways. The key factor determining the appropriate configuration is whether the EGO Service Director can send dynamic updates to the corporate DNS server.

Enabling dynamic corporate DNS updates

Dynamic updates may conflict with your organization’s IT policies: a compromised corporate DNS may bring down the whole network, so this option may not be appropriate in many settings. However, if the answer is "yes", EGO Service Director is granted write access to the corporate DNS. The alias for location of the EGO Service Director (named process) is kept up-to-date in the corporate DNS. As soon as EGO starts SAS services on a host, those aliases are written to an EGO DNS database.

What happens when there is a server failure? In the following example, the SAS server sgcwin071 failed. Once EGO starts the application on the failover server, it sends the address of this new host to the corporate DNS server. The entry for the meta_alias is updated in the DNS server, so when SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias on port 5555, the DNS server returns the address of the failover host sgcwin072.

Enabling DNS resolution with EGO DNS server

A more common option is to configure EGO Service Director as a stand-alone DNS server, which can serve as the authoritative name server for the SAS subdomain and respond to DNS queries for the high-availability SAS services it manages.

The virtual hostnames for the EGO high-availability services will always be in a subdomain of the corporate DNS domain. For instance, if the corporate domain were exnet.xyz.com, then all virtual hostnames for EGO high-availability services would be in the subdomain ego.exnet.xyz.com by default. It is important that the corporate DNS server be configured with multiple Name Server records for the EGO subdomain, one for each of the redundant nodes that can possibly execute the EGO DNS server.

These are the steps that SAS Management Console follows in order to connect to the SAS Metadata Server in this scenario:

SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias.ego.exnet.xyz.com on port 5555.
The corporate DNS finds in its internal table that all queries for addresses in the form *.ego.exnet.xyz.com are to be rerouted to another DNS running on IP1 or IP2 or IP3
The EGO DNS receives the query and responds to SAS Management Console with the physical IP address for the meta_alias name, which is bound the physical server where the SAS Metadata Server is running.
The connection request is properly routed to the sgcwin071.exnet.xyz.com host, where the SAS Metadata Server is running.

If SAS server sgcwin071 fails, EGO starts the managed service on the failover server, then it updates its internal DNS table with the address of the failover machine for the meta_alias name. When a new instance of SAS Management Console makes a request to connect to the SAS Metadata Server on meta_alias on port 5555, EGO DNS server returns the address of the failover machine, sgcwin072, as shown below.

For the software solution, the choice between direct use of the corporate DNS versus implementing an EGO DNS server, is usually determined by IT governance policies. Additionally, both types of software solutions outlined above have drawbacks. By default, Windows clients cache DNS entries for some minutes, so they will not get the new IP address until the cache expires. For all that time, they will not be able to connect to the failover host. To prevent this issue, SAS administrators must disable the DNS cache for all Windows clients. This step generates extra DNS traffic for all look-ups.

Comparison of hardware and software solutions

The following table shows a comparison of the hardware and software solutions:

Conclusion
The method used to resolve the virtual hostname to the SAS service’s current physical location is completely hidden from the client; it is exactly the same as any other host to which a connection is to be made. However, the detail of how virtual hostnames are resolved is how the two solutions differ. With either solution, hardware or software, the fundamental concepts are the same:

Define a virtual hostname for the services that are to be high-availability within the grid.
Any client wishing to access a high-availability grid service must use the virtual hostname.
The virtual hostname is resolved to the current physical location of the service within the grid.

You can find more detailed configuration information regarding EGO Service Director and DNS integration in the High Availability Services with SAS Grid Manager.

Blogs