Determining the size of a SAS data set

8
Proc SQL: Beyond the Basics Using SAS

When developing SAS® data sets, program code and/or applications, efficiency is not always given the attention it deserves, particularly in the early phases of development. Since data sizes and system performance can affect a program and/or an application’s behavior, SAS users may want to access information about a data set’s content and size. To access, for example, how much disk space a data set is using, users can make a few calculations and/or learn how to access metadata content to obtain the information to this important information. This tip explores a few ways to determine, or estimate, the size of a data set – a question many users are curious about when discussing SAS performance and tuning techniques. You'll find more tips like this one in my book, PROC SQL: Beyond the Basics Using SAS, Second Edition.

Using PROC SQL and DICTIONARY.TABLES

The SAS System collects valuable information (known as “metadata”) about all known SAS libraries, data sets (tables), catalogs, indexes, macros, system options, views and a collection of other “read-only” tables called Dictionary tables and SASHELP views. One specific Dictionary table, TABLES, and its SASHELP view equivalent, VTABLE, contains details about a SAS session’s data sets. In the following PROC SQL code, the specification of a PROC SQL SELECT-clause is illustrated to access the contents of four columns found in the TABLES Dictionary table, specifically LIBNAME, MEMNAME, MEMTYPE and FILESIZE to display the size of the CARS data set.

PROC SQL and Dictionary.TABLES:

PROC SQL ;
  TITLE ‘Filesize for CARS Data Set’ ;
  SELECT LIBNAME,
         MEMNAME,
         FILESIZE FORMAT=SIZEKMG.,
         FILESIZE FORMAT=SIZEK.
    FROM DICTIONARY.TABLES
      WHERE LIBNAME = ‘SASHELP’
        AND MEMNAME = ‘CARS’
        AND MEMTYPE = ‘DATA’ ;
QUIT ;

Results

Size of a SAS data set

Analysis

As shown in the results, above, the CARS data set filesize is 192KB.  Note: When the SIZEKMG. format is specified in a format= option, SAS determines whether to apply KB for kilobytes, MB for megabytes, or GB for gigabytes; and divides the numeric filesize value by one of the following values:

KB           1024
MB          1048576
GB           1073741824

Using PROC PRINT and SASHELP.VTABLE

In the next example, the specification of a PROC PRINT is illustrated to access the contents of three columns found in the VTABLE SASHELP view, specifically LIBNAME, MEMNAME and FILESIZE to display the size of the CARS data set.

PROC PRINT and SASHELP.VTABLE

PROC PRINT DATA=SASHELP.VTABLE NOOBS ;
  VAR LIBNAME MEMNAME FILESIZE ;
  WHERE LIBNAME = ‘SASHELP’
    AND MEMNAME = ‘CARS’ ;
  FORMAT FILESIZE SIZEKMG. ;
  TITLE ‘Filesize for SASHELP.CARS Data Set’ ;
RUN ;

Results

Size of a SAS data set

Using DATA _NULL_, SASHELP.VEXTFL and CALL SYMPUTX

In the final example, a DATA _NULL_ is illustrated to access the contents of the VEXTFL SASHELP view with a FILENAME statement. An assignment statement is specified to calculate the FILESIZE value for the size of the CARS data set. The CALL SYMPUTX left justifies and trims the trailing blanks from the numeric FILESIZE value of 196608.

DATA_NULL_and SASHELP.VEXTFL

filename myfile 'C:\Program Files\SAS9.4\SASFoundation\9.4\\CORE\SASHELP\Cars.sas7bdat' ;
DATA _NULL_ ;
  SET SASHELP.VEXTFL (WHERE=(FILEREF=’MYFILE’)) ; 
  /* Calculate the Filesize in MB */
  FILESIZE = FILESIZE / (1024 ** 2) ;
  CALL SYMPUTX (‘FILESIZE’,FILESIZE) ;
RUN ;

Results

Size of a SAS data set

Want more SAS tips? Check out these useful tips and tricks from several other subject matter experts.

Share

About Author

Kirk Paul Lafler

Consultant and founder of Software Intelligence Corporation

Kirk Paul Lafler, consultant and founder of Software Intelligence Corporation, has been a SAS user since 1979. As a SAS Certified Professional, Kirk provides IT consulting services and training to SAS users around the world. He is the author PROC SQL: Beyond the Basics Using SAS, Second Edition. He’s also written more than 100 peer-reviewed technical articles and writes the popular SAS tips column, "Kirk's Korner, " that appears in several SAS Users Group newsletters. Kirk is a frequent speaker at SAS Users Group meetings.

Related Posts

8 Comments

  1. Hi Guys,

    When I tried to use Filesize to calculate the size of the compressed datasets on a library(not SASHELP) using dictionary.tables I got an error in logs stating variable Filesize does not exist !!

    Can you please help me on this?

  2. Milorad Stojanovic on

    As usual Paul has a new and very useful stuff to offer to SAS community. I am SAS user for over 20 years but I didn't know for SIZEKMG format. Shame on me.
    Milorad

  3. Great tip, thank you, Kirk. Interestingly, the filesize column in sashelp.vextfl does not have units of measure specified, not even in the label. I assume it is Bytes. Correct?

  4. Randy Betancourt on

    Paul,

    Thank you. This is very handy. It suggests that with the correctly constructed LIBNAME statement that concatenates a series of paths, one could construct a schema report for all the model inputs SAS knows about back to the system of record inputs for not just file size, but attributes like datatype, names, formats, informats and so on.

    That would be handy for everyone.

    Best,
    Randy

  5. Very useful tips, thanks for sharing. As far as I can remember, SIZEKMG format was pre-production back in 9.1 then went into production status with 9.2 onwards. It's only limitation (I know) is - by definition - that it's capped to (ca) 1 Terabyte (exactly1.99999...x(2^^39) rounded to 1024Gb ~ 1 Tb). Now that data sometimes outsize this limit, sometimes custom "SIZEKMGT" Picture formats must be re-defined ...

  6. Louise Hadden on

    The SAS-supplied format is so useful! I will make sure to incorporate this in my work. This will also help in "checksum" diagnostics looking at files across server instances. Thanks!

Back to Top