Wow did the summer fly by! Now that grad school is over, it’s about time to resume the “it’s all about the data” series. In the last several posts, I tried to lay a foundation for understanding how SAS stores and manages data for use in business intelligence and analytic applications. Now, I’d like to turn your attention to third-party database engines, which is the term (or SAS-ism) for when data is housed in non-SAS data structures. Examples include the obvious choices like Microsoft SQL Server, Oracle, MySQL, Teradata and a dozen or so additional supported third party database interfaces.
This article provides a quick overview of how SAS “talks” to third-party databases. If you are an experienced SAS programmer, you probably know much of what’s covered here. This topic has been written about extensively over the years, so while I wont go into detail on every nuance, I thought (thot!) it would be useful to provide a high level overview for those SAS administrators who are not SAS programmers but have to support the data needs of their user community. The focus here is the necessary preconditions for connectivity and the differences between explicitly passing through your query to the database versus letting SAS handle the negotiations. Let’s get started!
I can’t seem to connect to my database from SAS, what’s wrong?
Whether you are on Windows or UNIX, you need to both an ODBC driver manager and an ODBC driver to be installed. The error in the log below is for SQL Server and shows that the connection could not be made. You need to look at your ODBC.ini file (UNIX) or ODBC Administrator (Windows) and confirm that you have specified the details correctly. For more information on how to connect, take a look at the articles here for UNIX and Windows.
For Oracle, you need an Oracle client driver installed on the server, and the driver needs to know where to look for the database. This driver is usually defined in a file called tnsnames.ora that lives within the client driver directory. For more information, take a look at the articles here for connecting to Oracle databases on UNIX and Windows. Note that you can use either SAS/Access to Oracle (native driver) or SAS/Access to ODBC to make these connections from within SAS.
What is SQL Pass-thru and why should I care?
In SAS, you also have the option of using two features to access your third-party database: libname access or explicit pass-thru capabilities.
Libname access is just like it sounds – you simply use the LIBNAME statement in SAS and specify the engine along with your credentials and database specific attributes. Once the libref is instantiated, then you can use references to the library just as if it were a SAS library. Here is a quick example:
Explicit pass-thru, on the other hand, is different in that you are not letting SAS do any of the database negotiations for you. Instead, you tell SAS to pass through everything in your program directly to the database for processing. Here is the same connection to sqlsvr, this time using explicit pass-thru.
This is perhaps one of the better-documented SAS capabilities as it relates to best practice. For example, Jessica Hampton discusses SQL pass-thru in her NESUG 2011 paper "SQL Pass-Through and the ODBC Interface". Here, she compares the libname method (implicit) versus explicit pass-thru method.
There are so many options for connecting SAS to my database, which is the best?
This is a common question. After all, you can use Microsoft SQL Server using ODBC, SAS/Access to Microsoft SQL Server and OLE DB. For Oracle, you also have the option of using the native driver as well as ODBC. You’ll also see that this is true for many other database such as Teradata, mySQL, and so on. Generally the choice comes down to performance or flexibility:
- The best performance will always be the native driver for that database.
- However, if you want the flexibility of talking to lots of databases and don’t mind having to find and purchase individual ODBC drivers, then the more general SAS/Access to ODBC may be right for you.
One word of caution, however: if you use explicit SQL pass-thru, then you have to comply with the rules of the database driver that you are using. For example, there are lots of capabilities within the Oracle SQL language that are not available in ODBC. So if you want the full power of using database-specific language elements, then you will want to go with the native drivers.
What is the best option for connecting to Microsoft SQL Server?
For Microsoft SQL Server, people often ask if they should use SAS/Access to ODBC or the native SAS/Access to SQL Server engine? One of the best “how-to” articles on SAS and Microsoft SQL Server can be found in the SAS Tech Note "Accessing a Microsoft SQL Server Database from SAS on Microsoft Windows". Although this note is a bit dated, it provides a good foundation for most users.
What might be surprising to some is that the native engine is actually a Microsoft SQL Server ODBC driver that is shipped with the SAS/Access product. SAS licenses the DataDirect ODBC driver for you so that you don’t have to find and eventually purchase another ODBC driver for your environment.
The DataDirect drivers have been proven to work efficiently with SAS, and they go through extensive testing before each version of SAS is released. On the other hand, if you want the flexibility of using your own ODBC drivers or have a need to connect to other database via ODBC, then SAS/Access to ODBC might be the route for you.
In the SAS Community, you may see papers that talk about Microsoft SQL Server that use ODBC and some use the native access engine: sqlsvr. Be sure to use the engine that you have licensed as well as the syntax appropriate to that engine.
There are other methods that can be used to get data in and out of Microsoft SQL Server from SAS such as OLE DB, but I’ll leave that to you to read "Getting Started with SAS/ACCESS Interface to OLE DB" as homework.
Can I use SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine?
A slightly more confusing answer comes when people want to use both SAS/Access to ODBC and SAS/Access to Microsoft SQL Server on the same machine. Without going into a detailed explanation, the short answer is “yes – you can, but you might not want to do that.”
Supporting SAS/Access products on the same machine requires that you set up the environment so that the drivers are defined correctly in the odbc.ini file and may necessitate using two startup scripts when launching SAS depending on how you want to install and configure. As long as you are precise about where you install the software and how the environment variables are instantiated, it should work.
Where do I go from here?
Obviously this topic is huge! Since there are already so many good references around this topic, I’ll just highlight a few more:
- Using SAS With a SQL Server Database (M. Rita Thissen, et al.) Introduces you to “sql” and how to perform queries specifically against Microsoft SQL Server
- Getting Started Using ODBC with SAS and SPSS (Indiana University) – A resource for students, faculty and staff that includes a very detailed primer on connecting and using ODBC data from SAS (and that other statistical software language)
- SAS ODBC Tutorial (DataDirect) – while this article focuses mainly on Oracle, the concepts and visuals are still relevant
- Database ACCESS basics: SQL and data step interface to relational databases (Ashley H. Sanders) - continues the discussion around the differences between libname access and PROC SQL methods.
- Scaling SAS® Data Access to Oracle® RDBMS (Howard Plemmons, SAS and Andrew Holdsworth, Oracle Corp) – an oldie but goodie!
- The SQL Tuning Checklist: Making Slow Database Queries a Thing of the Past (Jeff Bailey and Tatyana Petrova, SAS) – performance tuning ideas for debugging poor performance
- Optimizing Data Storage (SAS Documentation) – a good explanation of all of those SAS options for making database queries run better
For those new SAS administrators, are there other questions that you have about accessing data from third party relational database management systems?
Remember--Happy Data, Happy Users!
--greg
2 Comments
I have some remarks on some of your statements.
The oracle client does not need a tnsnames file. With SAS you can give parameters for that in the libname sql statement as follows.
path=´(DESCRIPTION=
(ADDRESS = (PROTOCOL = TCP) (HOST = ) (PORT = ) )
(CONNECT_DATA = (SID=) )
)´
This even not well documented in the SAS/Access manuals .With every type of external DBMS there are a lot of variations. , within the intelligence platform the options are is more limited.
Using the intelligence Platform should be go with the leaving the mindset of having your own machine. Today there are no “own machines” anymore in datacenters as they are tending to deliver virtualized machines. This approach is having a lot of advantages with D&R. That is relocating the virtual machine.
The Enterprise Platform of SAS is introducing a lot of servers, metadata serverm object spawner )Lev context) logical appserver , workspace server, stored process server, batch server, connect spawner. The all run mostly on a single machine. The word server is confusing as like the Windows services tasks.
Do want or need to access many types of external database for one (1) client process. It is the setting of is application server context or at the dedicated server where all is facilitated. There are a lot of usermod options for the autoexec config en OS-script settings.
This a complete other world and other approach as the local do it by yourself SAS installation on a pc.
Hi Japp
Thanks for your comments!
So like anyone who has used SAS for any length of time will quickly realize, there are at least 7 ways to do any one thing in SAS - 2 of which will be completely inefficient! I say that in jest, but the approach you provide for specifying the oracle connection details in the SAS program assumes that (a) you like typing and (b) you like hardcoding - neither of which I am a fan. There are situations where we do see this - for example, when the sys admin won't allow you to update the global tnsnames entry - but generally, I like to write once and use where we can without duplication.
Also your comment about servers is absolutely right. there are a ton of new servers and I have a distinct bias, that most everything I write about is related to "servers" in the truest sense of physical (or virtualized) multi-user server environments such as UNIX, mainframe or Windows server class machines. Since this blog entry was addressed to SAS administrators, I don't consider managing your PC as a SAS admin function unless we are referring to client deployments.
--greg