Performance tips for viewing data in SAS Enterprise Guide

SAS users love to look at data. And the data grid in SAS Enterprise Guide is a convenient way to view the contents of a data set. While small data sets can be rendered lickity-split for quick viewing, sometimes people get justifiably anxious when opening very large data. Perhaps they've been burned with a bad experience, because opening a huge transaction table can hang up a system. When dealing with very large data, it's wise to be cautious.

In general, the SAS Enterprise Guide data grid can open and show data quickly, even if the data sets are very large. This is because the data grid does not read in all of the data records at once, but reads in only the records that are needed to fill the rows of the grid. The data grid uses a "virtual mode", which allows you to scroll quickly around your data set, no matter the size.

In addition to the scroll bar, which shows you approximately where you are in the data set, there are a few keyboard shortcuts that can help you to navigate:

  • Ctrl+End - takes you to the last record, last column in the data set.
  • Ctrl+Home - takes you to the first record, first column in the data set.
  • Ctrl+G - brings up a "Go to" window, in which you specify exactly which record number to bring to the top of the view.

Note: The concept of "record number" is a specific to SAS data sets, since traditional SAS files store records sequentially. When you view a database table (accessed through a SAS/ACCESS library), the idea of "record number" might not make as much sense, since the database rules determine the sequence. However, you can still use the keyboard shortcuts to navigate your view.

Even for all of its built-in smarts, the data grid can still perform slowly in certain situations. If you're experiencing slow performance, here are some tips to help.

Understand your network latency. SAS Enterprise Guide retrieves data records from your SAS session, which is typically on another machine in your network. If your network connection is slow or the remote machine is very remote (such as in another country), fetching those records will take more time.

Avoid opening data unnecessarily. When you add data to your project, whether using File->Open Data or by running a task that produces output data, SAS Enterprise Guide will automatically open the data for viewing. Simply opening the data can cause a noticable delay when there is network latency or a slow database connection. If you're already familiar with the data and don't need to see the records every time, you can change this default behavior. From Tools->Options, select Data General and uncheck the option for "Automatically open data when added to the project."

Avoid scrolling through very wide data. As I stated earlier, large data isn't generally a problem in SAS Enterprise Guide. However, when the data are very wide, the grid can be slow. Why? Because even though the data grid reads only enough data records to fill the grid, it reads only entire records -- even if some of the variables in the record aren't visible on the screen at the time.

I define "wide" data as hundreds or thousands of variables, or maybe fewer variables that each have very large lengths. Consider these two LENGTH statements, each of which allocate 2000 variables but yield dramatically different record lengths:

/* 2000 numeric vars, 8 bytes each = 16,000 bytes per record */
/* a data set with just these fields can render quickly in SAS EG */
length x1-x2000 8;
/* 2000 character vars, 1000 bytes each = 2,000,000 bytes per record! */
/* a data set with these fields can really slow things down */
length c1-c2000 $ 1000;

Use the Data Explorer with care. The Data Explorer (added in SAS Enterprise Guide 5.1) is a popular new feature (see Marje Fecht's enthusiastic review). The Data Explorer can provide at-a-glance "Quick Stats" of variable distribution and frequency. However, the behind-the-scenes analyses that support those stats can be expensive for very large data. SAS Enterprise Guide sets some automatic constraints in order to limit the impact, but you should set your expectations accordingly.

Understand the special treatment for database tables. When you access data in a library using a SAS/ACCESS engine (for example, connecting to Oracle or Teradata), SAS Enterprise Guide automatically limits the data grid view to 10,000 records. You can configure that number in the application options, but the reason for the constraint is to limit the impact on the database connection and to keep your DBA happy (well, as happy as DBAs tend to be, anyway). 10,000 records should be enough for you to "eyeball" the shape of the data. If you need to see rows that meet specific criteria, consider using the Query Builder.

To paraphrase King Burger's Bon Qui Qui (from the famous MAD TV sketch): You can view your data your way, but don't get crazy. I hope that with a little knowledge about what happens behind the scenes, you can click through your data with more confidence.

tags: big data, data access, SAS Enterprise Guide, sas/access

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>