Performance tips for viewing data in SAS Enterprise Guide

SAS users love to look at data. And the data grid in SAS Enterprise Guide is a convenient way to view the contents of a data set. While small data sets can be rendered lickity-split for quick viewing, sometimes people get justifiably anxious when opening very large data. Perhaps they've been burned with a bad experience, because opening a huge transaction table can hang up a system. When dealing with very large data, it's wise to be cautious.

In general, the SAS Enterprise Guide data grid can open and show data quickly, even if the data sets are very large. This is because the data grid does not read in all of the data records at once, but reads in only the records that are needed to fill the rows of the grid. The data grid uses a "virtual mode", which allows you to scroll quickly around your data set, no matter the size.

datagrid
In addition to the scroll bar, which shows you approximately where you are in the data set, there are a few keyboard shortcuts that can help you to navigate:

  • Ctrl+End - takes you to the last record, last column in the data set.
  • Ctrl+Home - takes you to the first record, first column in the data set.
  • Ctrl+G - brings up a "Go to" window, in which you specify exactly which record number to bring to the top of the view.

Note: The concept of "record number" is a specific to SAS data sets, since traditional SAS files store records sequentially. When you view a database table (accessed through a SAS/ACCESS library), the idea of "record number" might not make as much sense, since the database rules determine the sequence. However, you can still use the keyboard shortcuts to navigate your view.

Even for all of its built-in smarts, the data grid can still perform slowly in certain situations. If you're experiencing slow performance, here are some tips to help.

Understand your network latency. SAS Enterprise Guide retrieves data records from your SAS session, which is typically on another machine in your network. If your network connection is slow or the remote machine is very remote (such as in another country), fetching those records will take more time.

Avoid opening data unnecessarily. When you add data to your project, whether using File->Open Data or by running a task that produces output data, SAS Enterprise Guide will automatically open the data for viewing. Simply opening the data can cause a noticable delay when there is network latency or a slow database connection. If you're already familiar with the data and don't need to see the records every time, you can change this default behavior. From Tools->Options, select Data General and uncheck the option for "Automatically open data when added to the project."

Avoid scrolling through very wide data. As I stated earlier, large data isn't generally a problem in SAS Enterprise Guide. However, when the data are very wide, the grid can be slow. Why? Because even though the data grid reads only enough data records to fill the grid, it reads only entire records -- even if some of the variables in the record aren't visible on the screen at the time.

I define "wide" data as hundreds or thousands of variables, or maybe fewer variables that each have very large lengths. Consider these two LENGTH statements, each of which allocate 2000 variables but yield dramatically different record lengths:

/* 2000 numeric vars, 8 bytes each = 16,000 bytes per record */
/* a data set with just these fields can render quickly in SAS EG */
length x1-x2000 8;
 
/* 2000 character vars, 1000 bytes each = 2,000,000 bytes per record! */
/* a data set with these fields can really slow things down */
length c1-c2000 $ 1000;

Use the Data Explorer with care. The Data Explorer (added in SAS Enterprise Guide 5.1) is a popular new feature (see Marje Fecht's enthusiastic review). The Data Explorer can provide at-a-glance "Quick Stats" of variable distribution and frequency. However, the behind-the-scenes analyses that support those stats can be expensive for very large data. SAS Enterprise Guide sets some automatic constraints in order to limit the impact, but you should set your expectations accordingly.

Understand the special treatment for database tables. When you access data in a library using a SAS/ACCESS engine (for example, connecting to Oracle or Teradata), SAS Enterprise Guide automatically limits the data grid view to 10,000 records. You can configure that number in the application options, but the reason for the constraint is to limit the impact on the database connection and to keep your DBA happy (well, as happy as DBAs tend to be, anyway). 10,000 records should be enough for you to "eyeball" the shape of the data. If you need to see rows that meet specific criteria, consider using the Query Builder.

To paraphrase King Burger's Bon Qui Qui (from the famous MAD TV sketch): You can view your data your way, but don't get crazy. I hope that with a little knowledge about what happens behind the scenes, you can click through your data with more confidence.

tags: big data, data access, SAS Enterprise Guide, sas/access

6 Comments

  1. Ritesh
    Posted August 10, 2014 at 8:04 pm | Permalink

    Hello Chris,

    Great reading all your posts, very informative. Wanted to ask you, if it is best practice to register SAS lbraries in SAS SMC for the SAS users at my site. Or is it better if every user has their own user id and password to connect to their data sources like Oracle, DB2, SQL etc.

    Please let me know, appreciate your opinion.
    Ritesh

    • Chris Hemedinger Chris Hemedinger
      Posted August 11, 2014 at 8:01 am | Permalink

      Ritesh,

      Thanks for the kind words.

      Regarding your question, I don't think this is an "either/or" situation. If you use DBMS libraries at your site, it's best to provide access control at the database level. Whether that's by using group permissions or individual accounts -- that's up to you and the DBAs. But it's not the recommended practice to rely on SAS metadata to "secure" access to these DBMS resources.

      That said, there are advantages to registering libraries and tables in SAS metadata. When you have registered tables, you can organize the tables into the SAS Folder structure that might make them easier to find and work with. Also, you can then use these tables as the basis for SAS information maps and reports in SAS Web Report Studio.

  2. Ritesh
    Posted August 11, 2014 at 9:59 am | Permalink

    Chris thk u for the opinion.

    Our site has 400 users and each user having their own access to the DBMS libraries was being an issue. We have SAS EG, SAS DI. We are slowly moving to a Teradata shop and was planning to follow this SAS link which suggests this preferred method:

    http://support.sas.com/resources/papers/93286_TS808.pdf

    "As a SAS software user, you might need to query a Teradata table from SAS®
    business-intelligence (BI) clients. There are several ways that you can access a Teradata database from a BI client application. However, the preferred method, which is described in this document, is to define a Teradata library in SAS® 9.2 Management Console or later. SAS
    Management Console provides a point of control for security management of this library."

    Please let us know, Thanks, Ritesh

    • Chris Hemedinger Chris Hemedinger
      Posted August 11, 2014 at 11:08 am | Permalink

      Ritesh -- yes, that's a good document with good guidance. Note that it also lays out your possibilities: use a group login or use individual accounts, each of which can be managed using Auth Domains in SAS Metadata. If you use a group account, then anyone who knows the connection details and credentials can access the library via SAS code, even if the library definition is "hidden" and controlled with SAS Metadata permissions.

  3. Jim Bowman
    Posted August 28, 2014 at 12:58 pm | Permalink

    I just upgraded to SAS 9.4 and EPG 6.100

    When I open a node in a project that had previously been executed it now seems to take an extremely long time to open and even tells me "Not Responding" for up to a minute or so.
    Then the program appears and works fine. Until I go to another node and return to that node.
    Any ideas?

    • Chris Hemedinger Chris Hemedinger
      Posted August 28, 2014 at 1:20 pm | Permalink

      Jim,

      It's certainly not supposed to be slow...so something is wrong.

      Is this a program node? I know that there was a hotfix for some specific performance issues (and a few other bugs) in July -- have you applied it?

      Otherwise -- is the program large? Is it stored in the project or on the file system, and if the file system -- is it local or remote? You should contact SAS Tech Support for better diagnosis and steps.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>